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ABSTRACT 


This publication provides the algorithmic definitions, performance characteri- 
zations and application notes for a high-performance adaptive noiseless coding 
module. Subsets of these algorithms are currently under development in custom VLSI 
at three NASA centers. This report extends the generality of coding algorithms recently 
reported. 

The module incorporates a powerful adaptive noiseless coder for Standard 
Data Sources (i.e., sources whose symbols can be represented by uncorrelated non- 
negative integers, where smaller integers are more likely than the larger ones). 
Coders can be specified to provide performance close to the data entropy over any 
desired Dynamic Range (of entropy) above 0.75 bit/sample. This is accomplished by 
adaptively choosing the best of many efficient variable-length coding options to use on 
each short block of data (e.g., 16 samples). All code options used for entropies above 
1.5 bits/sample are "Huffman Equivalent," but they require no table lookups to 
implement. 

The coding can be performed directly on data that have been preprocessed to 
exhibit the characteristics of a Standard Source. Alternatively, a built-in predictive 
preprocessor can be used where applicable. This built-in preprocessor includes the 
familiar one-dimensional predictor followed by a function that maps the prediction 
error sequences into the desired standard form. Additionally, an external prediction 
can be substituted if desired. 

This report further addresses a broad range of issues dealing with the interface 
between the coding module described here and the data systems it might serve. 
These issues include: multidimensional prediction, archival access, sensor noise, rate 
control, code rate improvements outside the module, and the optimality of certain 
internal code options. 
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SOME PRACTICAL UNIVERSAL NOISELESS 
CODING TECHNIQUES, PART III, MODULE PSI14,K+ 


I. INTRODUCTION 

References 1-5 provide the development and analysis of some practical 
adaptive techniques for efficient noiseless (lossless) coding of a broad class of data 
sources. Specifically, algorithms were developed for efficiently coding discrete 
memoryless sources that have known symbol probability ordering but unknown 
probability values. General applicability of these algorithms is obtained because most 
real data sources can be simply transformed into this form by appropriate reversible 
preprocessing. 

The applicability of noiseless coding to several high data rate NASA 
instruments has recently fostered the definition of a specific noiseless coding "module" 
called PSI14.K+ [6], Extensions and modification of key adaptive coding algorithms 
from that earlier work are incorporated in this module, along with a standard 
preprocessor (denoted by the "+"). This PSI14.K+ definition evolved in an attempt to 
minimize hardware requirements without incurring a loss in performance. Two very 
similar subsets of the module have been implemented as CMOS VLSI chip sets by the 
Jet Propulsion Laboratory (JPL) and Goddard Spaceflight Center (GSFC) in 
collaboration with the University of Idaho's Microelectronics Center. The algorithmic 
definitions that specifically focus on these implementations are provided in Ref. 7 and 
are discussed here also. 

The GSFC/U. of Idaho implementations include full custom 1.0-pm CMOS 
coder and decoder chips [8]. Both types of chip were recently tested successfully 
under laboratory conditions at input data rates up to 700 Mbits/s. By operating on 
sampled data quantized from 4 to 14 bits/sample, efficient coding performance can be 
expected over a range of entropies from 1.5 to 12.5 bits/sample. 

JPL developed and recently tested 1 .6-pm CMOS coder chips based on both 
gate-array and standard cell technologies [9]. These chips, while essentially 
implementing the same subset of PSI14.K+ algorithms, have thus far included fewer 
general-purpose features than the corresponding GSFC/U. of Idaho coder chip. The 
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Comet Rendezvous/Asteroid Flyby (CRAF)/Cassini project has recently initiated efforts 
to flight qualify an upgraded version of the gate-array design. 

The primary purpose of this report is to present the functional, algorithmic and 
performance characteristics embodied in the most general PSI14,K+ module 
definition. Appropriate material from earlier references will be consolidated here in 
support of that presentation. 

The first chapter provides the motivation and technical framework for a 
PSM4.K+ module specification. This includes basic notational quirks used throughout, 
performance goals and a key partitioning of the coding process into separate pre- 
processing and adaptive variable-length coding steps. 

Chapters II and III focus individually on the specific PSI14.K+ module 
requirements for preprocessing and adaptive variable-length coding, respectively. 

Finally, Chapter IV addresses a broad range of issues dealing with the interface 
between a PSI14.K+ module and the data systems it might serve, including: 

• Multidimensional prediction. 

• Archival access. 

• Sensor noise. 

• Rate control. 

• Code rate improvements outside the module. 

• Optimality of certain internal code options. 


BACKGROUND, REVIEW AND ORIENTATION 

The general form of a noiseless coding module is, from Refs. 3-7, given in Fig. 
1 . We will use it to reintroduce notation and focus on the goals of this report. 
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Fig. 1 . General Coding Module 


As shown, the coding process consists of two steps: 

1) Reversible preprocessing of a block of data samples X into another data 
block 8 n that has certain standard characteristics, and 

2) Using adaptive variable-length coding to efficiently represent the 
standard source 5 n block produced by Step 1. 

Ultimately we will provide the definitions and performance characteristics that 
convert this general-purpose coding module into PSI14.K+, which includes the subset 
described in Ref. 7. 
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Notation 


In the figure, the overall coding process is specified by the "code operator" 

'•'all (1) 

That is, Vg[ ] operates on X (and possibly a priori or side information) to produce the 
coded sequence V a [X], This form of notation (subscripting and superscripting the 
Greek letter V) was introduced in Ref. 3, and we will add to the list here. However, for 
space and convenience we will often call out a particular V a [ ] by the English form 
PSIab. For example, V-|4 ,k[ ] would become PSI14.K and 

V 14iK [X]hPSI14,K[X] 

The reversibility of these operators requires that: Given PSIab[X] and any a 
priori or side information used in the coding process, the original data block X can be 
recovered precisely. 

Sequences vs. Samples. Where appropriate, we will emphasize that a 
quantity is a "sequence" of bits or samples by placing a tilde (~) over that quantity. 

Concatenation. If A and B are two sequences of samples, then we can form a 
new sequence C by running them back-to-back as 


C = A * B (2) 

and using the asterisk to indicate concatenation. However, the asterisk will be omitted 
occasionally when no confusion should result. 

Length of a Sequence. The function £ (•) will be used to specify the length 
of a sequence. For example, if A is a binary sequence, then 

£ (A) (3) 
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denotes its length in bits. Without any anticipated confusion, if A is nonbinary, we take 
£{A) as either the length of A in samples or the length of a standard fixed-length 
binary representation for A in bits, whichever is more useful in context. 

Estimated length. To indicate an "estimate" of the length of some sequence, 
we will use y( ), appropriately subscripted and superscripted. For example, we have 

Yab(X) - £ (PSIabpq) (4) 

as an estimate of the length of coded sequence PSIab[X]. 

Standard Source £ n 

We can better understand the function of both preprocessor and coder by 
understanding the idealized Standard Source for the sequences 

6 n = 5-1 5 2 -..5 j (5) 

described as follows: 

1 ) The J samples of 8 n are values from the set of the nonnegative integers 

0, 1,2, ...q-1 (6) 

2) The samples of 5 n have the probability distribution 

P 5 = {P0- Pi. • • • Pq-l) (7) 

3) The {pj} exhibit the ordering 

P0 > pi > P2 ^ > Pq-i (8) 

as illustrated in Fig. 2, and 

4) The samples of 8 n are independent (uncorrelated) with 

themselves and any "available” a priori or side information. (9) 
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Low Activity 



High Activity 

0 1 2 3 4 5 6 ^ 8 9 10 
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Fig. 2. 5 n Sample Probabilities 


While idealized, these conditions can easily be well approximated for many 
practical problems. In any case, it is the preprocessor's task to achieve and maintain 
these conditions as closely as possible. Let's look at the consequence for the second 
step in Fig. 1 (coding). 

By (6), the coder always has to deal with the same alphabet (with the exception 
of its size, q), regardless of the originating source. 

When using any specific variable-length code, a maximum reduction in rate is 
obtained by using the shortest code words for the most frequently occurring symbols 
and the longer code words for the less likely symbols. Then by (8), the assignment of 
code words to a preprocessed 5 n sequence should always be to assign the shortest 
code words to the smallest integers. Under condition (8), this is the best 
assignment for any P5 (meeting (8)) and any variable-length code. However, this does 
not say anything about whether a particular code is the best one to use. 
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The condition in (9) means that the burden of making the most from data 
correlation and a priori knowledge is placed on the preprocessor. If correlation still 
exists in 5 n , then the preprocessor can probably be improved, which yields 5 n 
distributions with more frequent occurrence of the smaller integers. 


Coder Performance 

The entropy of a particular distribution P 5 is given by 

H(P§) = H§ = * £ pj log2 Pj bits/sample (10) 

i 

If P§ is unchanging, then H§ is a bound to the best performance of any coding 
algorithm which follows. Assuming that preprocessing condition (9) has been met, H§ 
is also a bound to overall performance of a coding module. For a given P 5 , the 
best single variable-length code can be derived from the Huffman algorithm P°]. 
Unfortunately, the real world hardly ever provides P 5 which do not change. 

In practice, the idealized preprocessing conditions in ( 6 )— (9) can usually be 
well approximated and maintained, but they change overtime (see for example, Fig. 
2). Consequently, the real practical problem facing the coder of preprocessed 5 n is to 
maintain efficient performance as P 5 changes. That is, usually the full burden of 
adapting to nonstationary data characteristics can be placed on the coder. 

We say that that a coder is efficient if it obtains performance 

"close to" an average measured entropy, H 5 , which will vary as P§ 

does. OP 

To bound performance in the classic sense, H 5 must be measured over a span 
that is both long enough to be statistically significant and short enough to catch the 
real statistical variations. The real world is full of compromises, so H 5 should generally 
be viewed as a guide to good performance rather than a bound. 

The term "close to" leaves some room for interpretation, which usually means 
within 0.2 to 0.3 bit/sample. Thus, one efficient coder might be "more" efficient than 
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another under certain conditions. At very low entropies, which are not of concern to us 
here, being within 0.2 bit/sample of H§ could not be considered efficient. 

Dynamic Range. We will often be interested in the range of entropies over 
which a particular coder can be viewed as efficient in the sense just described. This is 
called a coder's Dynamic Range (of efficient performance). 

Application Specific Goals. The performance goal for the variable-length 
coder is summarized graphically in Fig. 3. 



Fig. 3. Desired Average Coder Performance 
for a Specific Application 


Graphically, the figure says that a coder's Dynamic Range should fall within the 
application's expected range of entropies. 

The "Don't Care" regions simply indicate that entropies outside the application's 
range are unexpected, and so concern for good performance in these areas is not 
critical. However, real-world problems often produce transient situations which don't fit 


8 


Mi li 




the norm. It is a good idea to build in some additional robustness (larger Dynamic 
Range) to deal with such transients. 

GETTING MORE SPECIFIC 

Figure 4 replaces the general coding module of Fig. 1 with the next level of 
detail. We can begin to unveil module PSI14,K+. 



Fig. 4. Module PSI1 4, K+ High-Level 
Functional Block Diagram 


A specific predictive preprocessor that probably offers the broadest applicability 
to real problems will be discussed in Chapter II. The specific adaptive variable-length 
coder PSI14.K will be treated in Chapter III. But certain parameters and desirable 
features can best be introduced here without the additional detail. 

Input 

The input X on the left of Fig. 4 is a J-sample data block containing n-bit 
samples. The built-in preprocessor will convert X into the standard form 5 n , which is 
also a J-sample sequence of n-bit samples. As already noted, 8 n generated in this 
way will satisfy or closely approximate the desired standard preprocessing conditions 
[ 6 ], 
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To Preprocess or Not To Preprocess 

For those situations where the use of an external preprocessor is desirable, 
data can be entered directly into the coder, as shown. This option is functionally 
indicated in Fig. 4 by a logic signal "El" controlling a switch. The inclusion of this 
feature considerably broadens the module's applicability. 

Parameter Ranges 

A discussion of parameter K, shown as externally controlling the coder PSI14.K 
in Fig. 4 will be provided in Chapter III. It suffices to note here that incrementing K by 1 
will shift the Dynamic Range upward by 1 bit/sample. The number of K options 
included in a particular implementation depends on the goals and constraints of that 
implementation. Additional means for externally shifting the Dynamic Range will be 
discussed in Chapters III and IV. 

Input Bits/Sample, n. There are numerous examples of real applications 
which could make use of the algorithms embodied in the PSI14.K+ coding module. 
The number of bits of data quantization for these examples lies in the range of 

2 < n < 16 (12) 

with current project driven interest on n = 8, 12 and 14. 

Block Size, J. Similarly, there are good reasons to consider block sizes over 
the range 


1 < J < 16 (13) 

although the vast majority of applications could do just fine with a fixed J = 1 6. 

Estimate. It is valuable to have a count of either the actual number of bits to 
code an individual data block or an estimate, as in (4). Thus, Fig. 4 includes 

*14, K <*> 
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as a desirable output of the PSI14,K+ module. Such estimates can be used to guide 
decisions that are external to the module. 

Other. Logic signal E2 and data signal Xj are shown for completeness. They 
are discussed in the next chapter. 





II. 


BUILT-IN PREPROCESSOR 


A functional block diagram of the built-in PSI14.K+ Module preprocessor is 
shown in Fig. 5. It is derived directly from the imaging preprocessor in Ref. 5. 



Predictor 

The first part of this preprocessor is a very simple predictor consisting of a 
single sample delay element. With Xj as the ith sample in an input block X, this delay 
element "predicts" that Xj equals the previous sample: 

Xj = Xj.-| (14) 

Such a predictor is broadly used as a one-dimensional (1-D) predictor. For 
many problems, no other predictor need be considered. However, the switch 

13 


PRECEDING PAGE BLANK NOT FILMED 





controlled by logic signal "E2" provides a means for using an arbitrary external 
prediction, when desirable or necessary. The switch also serves the dual purpose of 
providing the first prediction of a data block when the internal sample delay has not 
been initialized (e.g., for the first sample of an image line). 

An example of an external two-dimensional (2-D) predictor interfacing with the 
module’s preprocessor is illustrated in Fig. 6. The current sample shown is the mth 
sample of the ith line of a raster image - xj m . An external predictor predicts that the 
value of xj >m will lie halfway between the sample above xj.-j >m and the previous 
sample in the same line, Xj >m .-|. 


MODULE PREPROCESSOR 



Fig. 6. External 2-D Predictor Interfacing With Module Preprocessor 
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Error Signal 


The difference between any sample and its prediction produces the error signal 

Aj = Xj — Xj (15) 

Sequences of Aj tend to display the unimodal distribution in Fig. 5 so that the condition 
Pr[Aj = 0] > Pr[Aj = -1] > Pr[Aj = +1] > Pr[Aj = -2] > . . . (16) 


is consistently well approximated. 1 

By using all available a priori and side information, the best predictor would 
produce an uncorrelated sequence of Aj with generally the smallest errors (so that the 
error distribution is more peaked around zero). Ultimately this would produce the 
lowest code rate. The module’s external predictor option allows one to develop and 
use such a predictor if desired. However, the simple built-in delay predictor may come 
very close for many problems. 

A rule of thumb for imaging data is that a 2-D predictor, such as in Fig. 6, may 
provide as much as 0.5 bit/sample net improvement in code rate over the simple delay 
predictor. In general, the benefits of improved code rate must be weighed against the 
implications of added complexity for each individual compression system 
implementation. 


1|Mote that reversing the position of positive and negative differences in (16) is just as 
good an assumption. From a hardware-implementation point of view, there is a slight 
advantage to the arrangement in (16) [®1. Except where noted otherwise, the ordering 
of differences as in (16) will be assumed. 
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Mapping into the Integers 


The final preprocessor step is to map the prediction errors, {Aj}, into the non- 
negative integers so that the probability-ordering condition in (8) is well approximated. 
This is accomplished for the conditions in (16) by the basic mapping operation in 
Table 1 and Eq. 17. 

Table 1. Basic Mapping of Aj into the Integers, 5j 


Prediction 
Error, Aj, 

Integer 

0 

0 

-1 

1 

+1 

2 

-2 

3 

+2 

4 

-3 

a 

5 

■ 

i 

■ 

■ 

■ 


! 2|Aj| - 1 if Aj < 0 

(17) 

2Aj if Aj > 0 

But condition (16) cannot be true for all Aj when the signal values are close to 
the boundaries of the signal dynamic range. For example, if xj = x;.-| = 0, the error Aj = 
xj - 0 = xj > 0, so that negative errors cannot occur. Then for the mapping in (17) 

Pr[5j = odd number > 0] = 0 
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and so condition (8) can't be true either. Reference 5 provides an alternative mapping 
that takes advantage of these signal dynamic range constraints to avoid this problem. 
This mapping can be rewritten here for the ordering in (16) as 

! 2A\ 0 < Aj< 0 

2| Aj | — 1 -0 < Aj < 0 (18) 

0 + | Aj | Otherwise 

where for n-bit quantized samples 

0(xj ) = min (xj , 2 n -1 - xj) (19) 

For our example where Xj = 0, if a Aj = +6 occurred, Eq. 18 would produce a 8j = 6, 
whereas Eq. 17 would produce a 8j = 2(6) = 12. Appendix A provides additional 
information on the alternative mapping functions. 

Performance Advantage. The performance benefit of using (18) and (19) 
instead of (17) is application dependent. Several tests on 8-bit imaging data showed 
typical improvements from 0.01 to 0.03 bit/sample. 

Quantization Advantage. But this mapping has an additional advantage. 
Whereas n-bit data will produce n+1 bit prediction errors, 8j is constrained by (18) and 
(19) to only n bits. That is 

0 < 8j < 2 n -1 (20) 

Data Line Advantage. Yet another advantage appears. Note that if an 
external prediction value Xj in Fig. 5 is fixed at zero, that is 

Xj = 0 for all i 

and so 

Aj = Xj 
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Then by (18) and (19) 


0 =0 

5j = |Aj| = Xj 

That is, referring again to Fig. 5, any input Xj is passed directly through the 
preprocessor, unchanged. This accomplishes the function of the Fig. 4 data line 
controlled by logic signal El /El. The data line can be omitted. 
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III. ADAPTIVE VARIABLE-LENGTH CODING 


The general-purpose adaptive coder was designated as PSI11 in Ref. 3. Its 
functional form is given in Fig. 7. 

psiu 


Fig. 7. General-Purpose Adaptive Coder, PSI11 

The input to PSI11 is a J-sample preprocessed data block 6 n having the 
desired characteristics described in (5)-(9). 

The output of PSI1 1 takes the form 

Vll [6 n ] = ID(id) * V ajd [8 n ] (21) 

where id takes on the nonnegative integer values 

0 < id < N-1 (22) 
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and 


ID(id) 


(23) 


is a binary string designation for id. 

N is the number of code options available for coding data block 8 n . The 
(subscript and/or superscript) designation for the ith coder is otj_i . For example, if a 
coder named "PSI1 ,5” is the third coder in a list of coder options, then <*2 = 1 ,5. 

Decision operations, discussed later, determine which coder is best to use and 
designate this choice with decision or "identifier" number, id. This directs the PSI1 1 
adaptive coder to output the coded sequence 

V aid [5 n | < 24 > 

prefaced with a binary code for the identifier, ID(id), to tell a decoder which type of 
coded sequence follows. 

Unless noted otherwise, we will assume that the identifier code, ID, is an m-bit 
fixed-length code with 2 


m = Llog2Nj 


(25) 


Typically N is chosen so that N = 2 m . 

Note that we have modified the notation slightly from Refs. 3-5 to clarify the 
distinctions among a coder's decision, its binary representation, and the 
corresponding coder option designations. 


2 LxJ is the smallest integer greater than or equal to x. 
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Also shown in the figure is an estimate of the coded length of any internal code 
option i. By (4) 


Ya i (5 n )-Je('t' ai [8 n ]) 


and therefore, an estimate for PSI1 1 itself is 

£ (V, 1 [Sn ]) . ri 1 (8") = £ (ID(id)) + y aid (sn ) (26) 

and where typically £ (ID(id)) = m from (25). 

PSM 4 

We now define the individual coder options that convert the general form PSI1 1 
into PSI14. 

Fundamental Sequence, PSM 

Define the code word fs[i] by 

fs[i] = 000. ...0001 (27) 

i zeroes 

where i > 0 is an input integer. The length of "code word" fs[i] is 

t\ = £ (fs[i]) = i + 1 bits (28) 


Again, let 


5 n = 5-| 82 . . . 5j 

be an input sequence of samples meeting the preprocessing conditions described 
earlier. Then, the coding of 5 n by using fs[ ] on each sample yields the "Fundamental 
Sequence" of 8 n 
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PSI1 [5 n ] = ¥#"] = fs[5-|] * fs[5 2 ] * . . . fs[Sj] 


(29) 


That is, PSI1 denotes the application of the "fs" code in (27) to all the samples of 
a sequence. The length of a Fundamental Sequence is 


J 

F 0 = yi = £(PSI1[5 n ]) = J+ (30) 

j=1 

Note that the code defined in (27) is probably the simplest nontrivial variable- 
length code there is. It is defined for all input alphabet sizes by the simple expression 
in (27). This simplicity carries into both software and hardware implementations. 3 

Example. Let 


5 n = 8-| 82 ■ • • 5-J4 

= 00000400490010 (31) 

By applying the rules in (27) and (29), we get 
¥-| [5 n ] = 111 1 1000011 10000100000000011 101 1 (32) 

and from (30), or by counting, 


F 0 = 14 + (4 + 4 + 9 + 1 ) = 32 bits (33) 


^Clearly, if q = max i, then (27) and (28) can be replaced by 

fs[q] = 0 0 0 ... 0 0 0 and i q = q bits 
q zeroes 

We refer to this refinement in the fs code definition as the "Optimized FS Code." 
However, except for very small alphabets, the added performance benefits can be 
expected to be insignificant M*U. Thus, except where noted otherwise, we will 
presume the simpler definition in (27) and (28). 
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Performance. The average performance for PSI1, based on measured results 
from numerous data sources, is plotted in Fig. 8 as a function of entropy Hg (Eq. 10). 
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Fig. 8. Average PSI1 Performance 


Performance close to the entropy should not be a surprise since the fs[ • ] code 
in (27) is clearly a Fluffman code for some distribution. (The latter subject is later 
investigated in conjunction with other coding operations in Ref. 11). The Dynamic 
Range for this code is the entropy region of approximately 1.5 < Hg < 2.5 bits/sample. 
Such a narrow range is typical for individual variable-length codes. 
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Code Operator PSIO 

From (29) and (30), 

F~S = ¥-j [8 n ] = Cl C2 • • • Cf 0 ( 34 ) 

denotes a Fundamental Sequence of length Fq bits, where now Cj denotes the ith bit. 
Complement. The complement of FS is given as 

COMP[FS] = FS = ft l 2 • • • CFq ( 35 ) 

where Ci is the complement (2 n - 1 - Ci) of the ith sample of FS. 

Third Extension. The third Extension of FS is given as 

Ext3[F"s] = (Cl l 2 C3) * ( U C 5 Ce) * • • • (Cf 0 0 0) < 36 ) 

where the samples of F~S have been grouped into binary 3-tuples (the last 3-tuple is 
completed by adding dummy zeroes, as necessary). Thus, we have 

a = £ (Ext3[FSJ) = |_^J 3-tuples (37) 

and 

b = Z (Ext3[F~Sj) = 3 J = 3a bits (38) 

We are interested in Ext3[FS] as a source of binary 3-tuples, so we write 

Ext 3 [F&] = Pi * P 2 * • • • Pa (39) 

where (3j are the individual 3-tuples that make up (36). 
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Coding Ext 3 [FS], Code operator PSIO is defined as the coding Ext 3 [FS] in 
(39) by the variable-length code in Table 2. 4 


Table 2. 8-Word 3-Tuple Code, cfs[i] 


3-tuple 

Pi 

CODE WORD 
cfslpj 

000 

1 

001 

001 

010 

010 

100 

Oil 

on 

00000 

101 

00001 

110 

00010 

111 

0001 1 


Applying this code to the 3-tuples of Ext 3 [FS] provides the result we are looking 
for 


V 0 [5n] = cfs[(3-|] * cfs[p 2 ] * • . . cfs[pa) (40) 

Estimate. It can be shown that a bound and good estimate to £ (PSI0[5 n ]) is 

y 0 (8 n )= ^ + 2(F 0 -J)>£ (v(/o[8 n ]) (41) 


^This particular code is slightly different than listed in Refs. 1-4. It has the same code 
word lengths and thus results in the same performance. However, this arrangement 
offers a hardware implementation advantage [12]. 
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Example. Let 



and 

Ext3[FS] = (0 0 0) (1 0 0) (0 0 1 ) (1 0 0) (0 0 0) (46) 

Then, by applying the cfs[ ] code in Table 1 , we get from (40) 

V 0 [5 n ] = 10110010111 (47) 

and 

jS(Vo[S n ]) = 11 bits (48) 

Observe that by using (41 ), 

70 = 5 + 2(13- 10) = 11 =«2(V 0 [8 n ]) ( 49 ) 

Average Performance. The average measured performance for PSI0 is 
added to the Fig. 8 results in Fig. 9. 
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Fig. 9. Average PSIO Performance 


PSIO performance becomes efficient at the lower entropies precisely where 
PSI1 performance begins to pull away from the entropy line. This should not be 
surprising; we are coding FS = PSI1[5n] as a data source and using a Huffman (3- 
tuple) code, cfs[ ]. When PSI1 starts becoming inefficient, FS must have some 
redundancy left in it. 


It is debatable how "efficiency" should be defined as entropies drop below 1 
bit/sample. After all, at H5 = 1 bit/sample, 0.1 bit/sample represents a 10% error, 
whereas, at H5 = 5 bits/sample, 0.1 bit/sample represents only a 2% error. However, 
we will make the practical assumption that PSIO average performance can be called 
efficient down to 0.75 bit/sample. We note also that PSI9 from Refs. 3 and 4 can 
significantly improve performance as entropies become very low. 

Thus, we take the Dynamic Range for PSIO to be 0.75 < H5 < 1 .5 bits/sample. 
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The Unity Code Operator, PSI3 

A trivial code operator is obtained by defining PSi3[5 n ] as^any fixed-length 
binary representation of 8 n . In the simplest case, we can take PSI3[5 n ] as 5 n itself so 
that 


¥ 3 [8 n ] = 8 n 


(50) 


Operators PSI2 and PSI4 

We will only make brief note of these two code operators for historical purposes. 
They do not form a part of PSI14. 

PSI2 is very similar to PSIO. PSI2[5 n ] is obtained by directly using the code in 
Table 2 to code its 3-tuples PH 4 1. That is, perform all the same operations as for 
PSIO but don't complement the Fundamental Sequence. The Dynamic Range for PSI2 
is 3 < H5 < 4 bits/sample. 

PSI4, also called the "Basic Compressor," is an adaptive coder, as defined by 
PSI11 in Fig. 7 and (21), which uses the options PSIO, PSI1, PSI2 and PSI3. Its 
Dynamic Range is 0.75 £ H§ < 4.0 bits/sample, as illustrated in Fig. 10. 
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ENTROPY H s , bits/sample 

Fig. 10. Average Performance, PSI4 

Split-Sample Modes 

We can limit some of the generality provided in the notation of Ref. 5 since we 
are looking for a more specific result here. 5 

Again, let 8 n be a sequence of J preprocessed samples where now the "n" 
means that these samples are quantized to n bits. Define the basic "Split-Sample" 
Operator SS n -k by 

SS n >k[S n ] = {M n .k Lk} (51) 

5 For additional generality, Ref. 5 used SSq’^-] for SS n >^[ ], L® for L^, Mq’^ for M n <k, 
etc. 
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where 


J-sample sequence 
of all the k least- 

significant bits of (52) 

each 8 n sample 

and M n - k is simply the J-sample sequence of the n-k most-significant bits of each 5 n 
sample after removing the least-signficant k. That is 



M n .k = Sg^[5 n ] = 


J-sample sequence 
of all the n-k most- 
significant bits of 

»s* 

each sample of 5 n 


(53) 


and where we note that 


SS n *°[6 n ] = M n -° = 6 n and SS n * n [5 n ] = L n = 5 n 


so that 


0 < k < n (54) 

These operations are illustrated in Fig. 11. For future simplicity, we will call SS n - k 
simply "SPLIT." 
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SS n,k [-J= SPLIT 


r -i 



Fig. 11. Basic Split-Sample Operator 

Motivation. Let H(8 n ), H(M n - k ) and H(Lk) denote the average entropies 
associated with S n , M n -k and Lk sequences, respectively. It has been observed that 

H(M n >k) = H(8 n ) - k (55) 

and 

H(Lk) - k (56) 

provided that (approximately) H(M n - k ) > 3 bits/sample. 

Equation 56 says that the least-significant bits are totally random, and Eq. 55 
says that removing k least-significant bits drops the entropy of what remains by k 
bits/sample. More important, Eqs. 55 and 56 imply 

H(M n -k) + H(L k ) = H(8 n ) (57) 
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Thus, if both M n > k and L k sequences can be separately coded efficiently (close to the 
entropy), we will succeed in coding the original sequence 5 n efficiently also. 

Coding L k sequences efficiently is trivial; since they are random, their uncoded 
form is already an efficient representation. For the M n - k sequences, we note that they 
continue to retain the desired probability ordering of (8) as k is increased (remember 
that § n is preprocessed by assumption). Thus, at some value of k, the entropy of M 
sequences will drop low enough to lie within the Dynamic Range of code operators 

such as PSIO, PSI1 , PSI4, etc. 

Operator PSIi,k. Split-Sample "code" operator PSIi.k is defined by 6 

x t / i ) k[5 n ]=V i [M n - k ]*Lk ( 58) 


The structure of this operator is illustrated in Fig. 12. 

Example for PSI1,k. The following should give a practical feel for the Split 
Sample Mode concept. 


Let 


§5 = 10,4,3, 7, 5,0,2 


(59) 


6The order of PSIi[M n - k ] and L k in (58) is reversed from the definition in Ref. 5. 
Statistically the order doesn’t matter, but the arrangement in (58) appears to hold a 
hardware implementation advantage WrW. It also provides a format consistent w.th 
Split-Sample modes generated before preprocessing, as discussed later, where there 
are additional advantages to this ordering. 


32 


FSIi.k (Eq. 58) 


L k (LEflST-Signif ic«n+ k. bi+s) 



I 1 


Fig. 12. Split-Sample Coder, PSIi.k 


be a J = 7 sample sequence of 5-bit samples. In binary, 5 5 becomes 


55 = 010 10 ,001 00 ,000 11 ,001 11 , 


00 1 0 1 ,000 00,000 1 0 


where for future reference we have placed boxes around the two least-significant bits 
of each sample. Now let's try coding with PSI1 , 2 . A block diagram showing the steps is 
provided in Fig. 13. 


From the figure we have 

dt (PSI1 ,2[55]) = 26 bits (61) 

whereas a direct coding of 55 using PSI1 would require 
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£ 5 s 10,4,3,7,5,0,2 (INTEGER Form) 

s 01010,00100,00011,00111,00101,00000,00010 (BINARY Form) 


PSI1,2 (see Fig. 12) 



L s 10,00,11,11,01.00,10 


(Eq. 28) 


M 5 ' 2 S 010,001,000,001,001,000,000 

11 2 

S 2,1, 0,1, 1,0,0 (INTEGER Form) 





t [ft S, ‘ i ] = 001,01,1,01,01,1,1 


Fig. 13. PSI1 ,2 Example 


j? (PSI1 [5^]) = ^8j + 7 = 39 bits (62) 

j 

Performance for PSI4,k. Again for historical purposes, the performance for 
Split-Sample modes using PSI4 (as the coder of most-significant bits) is illustrated in 
Fig. 14 [1]—[4]. 

The figure shows that each increase in k produces a code option PSI4,k whose 
individual Dynamic Range has been shifted upward by 1 bit/sample. Further, there is a 
PSI4,k Split-Sample option to fit each 1-bit entropy range above 4 bits/sample. The 
breakpoints occur near integer entropy values. 
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Fig. 14. Average Performance, PSI4,k 


Performance for PS!1,k. It should be no surprise that the same type of 
curves result when only the simple code operator PSI1 is used to represent the most- 
significant bits. Of course, the performance curves start at lower entropies since the 
efficient operating range for PSI1 alone is between 1.5 and 2.5 bits/sample (versus 
about 0.75 to 4 bits/sample for PSI4). 


We will not study the individual characteristics of each PSI1 ,k. Instead, the next 
section investigates the composite performance obtained by an adaptive coder which 
has many PSH.k options from which to choose. This, in fact, is the basis for code 
operator PSI14. For detailed performance curves of the individual PSIl.k, consult Yeh 
P 1 ]. In fact, she has shown an equivalence between PSI1,k Split-Sample 
Modes and Huffman codes. This is discussed further in Appendix B. 
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CODE OPERATOR PSI14 


Definition 

PSM4 is defined as a version of adaptive coder PSI11 in (19), which is 
repeated here as 

¥ 14 [8n] = ID(id)‘V ajd [5n] (63) 


and which uses three or more "adjacent” code options from the list in (64) and code 
option PSI3 (or PSIF). 7 


( PSIO , PSI1 , PSI1.1 , PSI1.2. ... 
( X m 0 X = 1 X = 2 X = 3 


(64) 


Thus, a particular N-option PSI14 configuration is completely specified by 
identifying the first option in this list (from the left) to be included. And given this list, 
this first option can be specified parametrically by its position in the list 

X > 0 (65) 


as shown in (64). For example, X = 2 means the first code option to be used is PSI1 ,1 . 

Starting Option. The specification of this starting option can be made quite 
compact if we note that both PSIO and PSI1 are really limiting forms of Split-Sample 
modes PSIi.k with k = 0. That is, 


PSIO = PSIO.O 


( 66 ) 


and 


7pstF is the "Fast Compressor" discussed in Ref. 5. In this application. PSIF would 
replace PSI3 as an adaptive "backup coder" (see Eq. 50). However, most typical 
applications would receive no statistical benefit. 
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PSI1 = PSI1 ,0 


(67) 


We can quite generally define a PSI14 Starting Option as 

v i(X),k(X) for 0 < X < n + 1 (68) 


where 


m = 


0 if X = 0 

1 Otherwise 


m= 


0 


X - 1 


if X = 0 
Otherwise 


(69) 


(70) 


and n is the input bits/sample. 

Numbering Each Option. We must assign an identifier number to each of 
the N codes used in PSI14. By definition, we take 


id = N - 1 for PSI3 


(71) 


Then we take 


id = 0 (72) 

for the starting option at position X in the list of (64). All other options (to the right of this 
starting option) are assigned increasing id values as the list is traversed to the right. 
Parametrically, this is very straightforward. The idth option is 

V\(X + id), k(X + id) (73) 
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for 


0 < X < n + 1 and 0 < id < N - 2 

Note that X can have n + 2 possible values (0 through n + 1); this is the maximum 
number of coders available. Then 

N < (max no. of coders) - (starting coder no.) = (n + 2) - X (74) 

Looking closely at (54), (71) and (73), we see that parameters N, X, and n 
completely define any PSI14, including the identifier to use in (63). We 

will later see that these parameters also define the corresponding PSI14 Dynamic 
Range. 

Example. Let N = 6 and X = 3. Then a complete PSI14 code specification can 
be obtained from (68)-(74), as shown in Table 3. 

When X > 1. If we exclude PSIO from the possible options by restricting X > 1, 
a considerable simplification in notation results since 

i(X) = 1 for all X 


and 


k(^) = X- J \ 

Then, a PSI14 coder with parameters N, X and n has code options 

PSI1 , X - 1 + id for 0 < id < N - 2 


and 


PSI3 = PSI1 ,n for id = N - 1 
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where id is the identifier associated with each option. Under these restrictions, PSI14 
reduces to PSIss, as discussed in Ref. 7. Further restricting X to be fixed at X = 1 yields 
the basis of VLSI implementations [8H9]. 


Table 3. PSI14 Specification Example 
for N = 6, X = 3, n > 7 


Code 

Identifier 
Number, id 

Binary 

Identifier, 

ID(id) 

. 

Code Used, 
*«,< 

0 

000 

P$I1,2 

1 

001 

PSI1,3 

2 

010 

PSI1,4 

3 

Oil 

PSI1,5 

4 

100 

PSI 1 , 6 

5 

101 

PSI3 


More Generality. We will henceforth refer to the above definitions. However, 
it is worth noting that we can further extend the generality of PSI14 by allowing X < 0 to 
specify additional undefined code options PSI-1, PSI-2, .... These can be 
incorporated into the definitions here by noting that PSI-i = PSI-i.O as in Eq. 66. 
Equations 69 and 70 become 


and 


i(X) = 


X X < 0 
1 Otherwise 



if X < 0 
Otherwise 
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PSM might designate legitimate low-entropy options or simply act as escape 
mechanisms to alert a decoder that subsequent coding will be using completely 
different algorithms. Such escape mechanisms could also be provided by using code 
identifiers id > N. 

Decision Criteria 

Optimum. The optimum criterion for selecting the best option to represent 8 n is 
to simply choose the one that produces the shortest coded sequence. Using (4) we 
have 


Choose coder option id = id+ if 

je(Va id+ [5"])= wV< V «idl gn >)> (75) 


Simplified. The latter approach implies that each coded result must be 
generated to determine which option to use. The computation requirements can be 
drastically reduced by using estimates instead. 

Let 


Yctj d+ (5")-£<Va id |S n 1) 


be the estimated coded length by using code option PSIctj^j. Then the simplified 
decision criterion becomes: 

Choose coder option id = id+ if 


Ya id+ <«") - w" <W8 n » 


( 76 ) 


Now consider estimates for the specific PSM 4 options in (64). 
I . Trivially we have 
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j?(¥ 3 [5 n ]= y(5 n ) = nj 


(77) 


For convenience, we repeat (43) here as 

jE(V 0 [8 n ]).V 0 (« n )= [irj +2(Fo-J) (78) 

Now consider the PSI1 ,k options in (58). First we have 

£ (V! j([6 n ]) = £ OK, [M n 'k]) + Jk (79) 

since the length of the separate least-significant bit sequence is fixed. 

Extending notation, we let 


F k = Jg(V 1 [Mn,k ]) (80) 

be the length of the Fundamental Sequence for the n-k most-significant bits M n >K The 
special case for Fq is shown in (30) to be simply the sum of the samples of M n -0 = 8 n 
plus the block size J. This same calculation applies more generally to the samples of 
M n ,k 


Taking advantage of the randomness in the least-significant bits, the expected 
value for each F k can be related to Fq by[5] 

E(F k |F 0 ) = 2-k F 0 + ^ (1 - 2-k) (81) 

By using (81) we can then estimate the overall length for each PSI1,k option as, from 
(79) 


£ (Vi ,k(5"]) = V, >k (S n ) = 2-k F 0 ♦ 5 (1 - 2-k) + Jk (82) 

Using the simplified decision criteria in (76) with the estimates in (77), (78) and 
(82) leads to clear-cut decision regions based on Fq. An example for an 8-option 
PSI14 with starting option PSI0(X = 0) is shown in Table 4. 
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Table 4. PSI1 4 Decision Regions, N = 8, X. = 0 


Code 

Option 

Decision Region 
N b 8, h ■ 0 

PSIO 

F 0 ^ 3J/2 

PSI1 

3J/2 < F 0 Z 5J/2 

PSI1,1 

5J/2 < F, $ 9J/2 

P$I1,2 

3J/2 < F„ < 17J/2 

PSI1,3 

17J/2< F, < 33J/2 

PSI1,4 

33J/2< F, < 65J/2 

PSI1,5 

65J/2<F 0 <(64n-351)J/'2 

PSI3 

(64n-351) J/2 < F„ 


It is a simple matter to create equivalent tables for other PSI14 configurations. 

Modified Simplified? The basic assumptions in determining the decision 
regions for the various PSIl.k options in Table 4 are based on the inherent 
randomness in the least-significant bits being split. This is a very good assumption for 
all but the F 0 2: 5J/2 decision point that determines whether PSI1 or PS II ,1 should be 

chosen. 

At low entropies, below 3 bits/sample, the least-significant bits start becoming 
less random. Consider how this affects the optimum decision point given by Ref. 5. 


Choose PSI1 ,1 if 

Fq > 3J — £ (least-significant bits) (83) 

If half of the least-significant bits are ones (corresponding to random), PSI1.1 is better 
if Fq ^ 5J/2, the same result as in Table 4. 
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However, as entropy drops there is a greater tendency for more zeroes to occur 
than ones, which culminates in all zeroes at an entropy of zero. With more zeroes, the 
decision region specified in (83) moves upward. This leaves open the question of 
whether the fixed simplified decision point of 5J/2 in Table 4 should be adjusted 
upward. 

One can expect only minor average performance gains, if any. Actual 
comparisons between the simplified rule in Table 4 and the optimum rule indicate very 
little advantage for the optimum rule. 

Baseline PSI14 Performance 

PSI14 coders which use code options PSIO or PSI1 as their starting option (X = 
0 or X = 1) are called "Baseline" PSI1 4 coders. 


The average measured performance for N = 8 Baseline PSI1 4 coders is shown 
in Fig. 15. 



Fig. 1 5. Baseline PSI1 4 Performance, N = 8, A. = 0, 1,n = 12 
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These graphs were primarily derived from 12-bits/sample imaging spectrometer 
data and assumed a block size of J = 16. Some areas above 8 bits/sample were 
approximated since very few samples were available. But exact precision is not the 
issue here. These curves are approximately correct for almost any real problem. The 
major observation is that efficient performance from ~0.75 bit/sample to 7.5 
bits/sample can be achieved with an 8-option RSI 14 coder with starting option PSIO. 
This roughly 7-bits/sample Dynamic Range can be pushed upward by about 1 
bit/sample by starting with PSI1 as the first option. The additional top-end performance 
is obtained at the expense of some low-end performance. 

The corresponding graphs for Baseline PSI14 coders with N = 4 options are 
shown in Fig. 16. Voyager images were the source of the data l 5 l. With only four 
options, the Dynamic Range is, of course, much narrower. 
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Coding for Very High Entropies 

The performance curves for the 8-option Baseline PSI14 coders in Fig. 15 
provide a broad Dynamic Range, which should be adequate for many applications. 
However, many newer scientific instruments have pushed the quantization 
requirements to much higher levels, some to as much as 16 bits/sample. The 
consequence is that entropies above 8 bits/sample (viewed as "Very High Entropies") 
may be present much of the time. This can lead to inefficiencies if restricted to the 
Baseline PSI14 performance curves in Fig. 15. 

Adding Options. Each additional PSI1,k option added to a Baseline PSI14 
coder will extend the top end of the Dynamic Range upward by 1 bit/sample. Thus, 
one solution is simply to add more options until the expected "very high" entropy range 
is covered. 

Assuming the simple fixed-length representation for the code identifier as 
specified by (25), to increase the number of options beyond N = 8 to the 9 ^ N < 16 
range will require an additional identifier bit. The potential performance impact is, at 
most, 1/J bits/sample or 0.0625 bit/sample for a typical block size of J = 16. 
Performance can still be considered "efficient." 

Such a penalty is probably insignificant considering the almost 15-bits/sample 
Dynamic Range provided by an N = 16, J = 16 Baseline PSI14 coder. Moreover, if the 
data source causes frequent and significant variations in data entropy, the 1/J cost will 
be more than compensated by the ability to choose from a greater number of options. 

Moving the Operating Range. After looking more closely at the class of real 
problems which generate very high entropies, it may not be necessary to add more 
options. We note that: 

• As the highest expected data entropies increase, so usually do the 

lowest. Thus, the complete range of expected entropies tends to 
move upward, not just the top end. (84) 

• Efficient coding performance is really only necessary over the 

expected range of data entropies. (85) 
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* The maximum expected range of data entropy (max entropy - min 
entropy) is unlikely to be as large as the Dynamic Range of 
efficient performance exhibited by an 8-option Baseline RS1 1 4, as 
shown in Fig. 14 (approximately 7 bits/sample). (86) 

These observations point to two other approaches to very-high entropy coding 
without extending the number of options. 

The first approach is inherent in the general definition of PSI14 in (63)-(73). 
Just pick a starting code option for a non-baseline 8-option coder (i.e., X > 1 ). 

That first code option determines the lower end of efficient performance for your 
PSI14 coder. Pick that option so that the full 7-bits/sample Dynamic Range is centered 
over the expected entropy range of your problem. Table 5 should help guide you in 
that decision; it is derived from the performance runs in Fig. 15 and simplified decision 
rules, as in Table 4. 


Table 5. PSI14 Dynamic Ranges for 8-option Coders, J = 16 
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Parametric Dynamic Range. In fact, we can extend these results to specify 
the Dynamic Range for any PSI14 with parameters N, n and X. 8 By using (69) and 
(70), we have 9 


- i n 

X + 0.75 - 0.25 i(^) < H5 < min <. ^ + n_q 5 (87) 

Note that the implementation of a PSI14 coding module with a shiftable 
Dynamic Range would require external control of the parameter X (starting option). 

We will later return to the issue of Dynamic Range after discussing another 
approach. 

CODE OPERATOR PSI14.K 

Another approach that is statistically equivalent to the Non-Baseline PS 1 1 4 
coders suggested in Table 5 is obtained by returning to the basic definition of Split- 
Sample code option PSIi.k, which is specified in (58) and Fig. 12. 

PSIi specifies the coder that is used to represent the most-significant n-k bits 
after splitting off k least-significant bits. The original Split-Sample coders used PSI4 for 
PSIi. In looking for implementation simplification, PSII was investigated and that led to 
PSII 4, a coder which (except for PSIO and PSI3 — which is actually PSII ,n) uses 
various PSII ,k as code options. Now replace PSIi with PSI14 itself. 

From Fig. 12, and using a capital K for the number of split least-significant bits, 
the code "option" PSII 4,K becomes that shown in Fig. 1 7. The internal PSII 4 is shown 
with parameters N and X to indicate the number of options and the starting option, 
respectively. 


SSome caution should be used in the interpretation of (87) for small block sizes (J < 
12), where the impact of identifier bits becomes increasingly significant. The subject of 
identifier coding is addressed in Chapter 4. 

^Where the first term would simplify to X + 0.5 when X > 1 [7], 
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Fig. 17. Split-Sample Coder PSI14.K 


Performance for K = 0 

With K = 0, all data pass directly to the internal PSI14, so that 

PS1 1 4,0 = PSI14 (88) 

Thus, PSI14 is really a special case of PSI14.K. If the internal PSI14 in Fig. 17 is an N 
= 8 option Baseline PSI14, the Dynamic Range is given in Fig. 15 and (87). We will 
henceforth use this new notation for PSI14 to emphasize the identity when K = 0. 


Performance for K > 0 

To see what happens when K * 0, note again that by (66) and (67), ALL the 
options specified for PSI14.0 are of the form PSIi.k. Returning to Fig. 17, it is easy to 
see that the net effect of a split of K bits is effectively to transform each PSIi.k in the 
internal PSI14 into another Split-Sample option with K additional splits. That is, 
equivalently for X > 0 and 0 < id < N— 2, 

Yj<uid),k(Uid) -* v i(X+id),k(Uid)+K ( 89 ) 
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With each code option affected in the same manner, it should be clear that the 

entire Dynamic Range for any PSI14,K will be shifted upward by 1 
bit/sample for each increase in K. 10 

Then we can specify the Dynamic Range for any PSI14,K configuration for K > 0 

as 11 

K + 0.75 + X - 0.25i(A.) < H 5 < min j k +U |M-0.5 (90) 

Remember, efficient means average performance close to the entropy. It 
doesn't mean that slightly better performance isn't possible under certain conditions. 
That is, one "efficient" coder option may still be better than another "efficient" coder 
option over a particular range. With this in mind, we look next at the comparison of an 
N-option PSI1 4,K and its closest equivalent N-option PSI14.0. 

INTERNAL X. > 1. If we momentarily exclude the possibility of using PSIO as a 
PSI14.0 option, we can say that for any N-option PSI14.K there is an equivalent 
PSI14.0 with exactly the same Dynamic Range. That is: 

If an N-option internal PSI14.0 of PSI14.K starts with option PSI1,k(X), X > 0, 
then an N-option PSI14.0 that starts with PSI1,k(X.) + K will exhibit the same Dynamic 
Range for K > 0. 

INTERNAL X = 0. When we include the possibility of PSIO as the starting 
option we cannot make as precise a statement. 

Consider the case where K = 0, 1 , 2. The effective options for PSI14.K are 


10of course, this is not precise when the top end of the performance range 
approaches n, the number of quantization bits. The same is true at the very low end 
where the LSBs are not totally random, as discussed earlier. 

1 1 Subject to the same caution for small values of J < 1 2. 
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K = 0 PSI0,0 PSI1.0 PSI1.1 PSI1.2... 

K = 1 PSIO,1 PSI1,1 PSI1.2 PSI1 ,3... 

K = 2 PSI0,2 PSI1.2 PSI1.3 PSI1,4... 

Indeed, the Dynamic Range will shift upward by approximately 1 bit/sample for 
each increase in K. But are these configurations identical to some PSI14.0 with a 
different starting option? 

For this example, and in general, there is no PSI14.0 configuration that includes 
starting option PSIO.k' for k' > 0 as an option. 

If we replace starting option PSI0,k' with PSIl.k'-l, we do get a legitimate 
PSI14,0 defintion since all the code options after the first already fit. So the only 
question is, How does PSIO.k' compare with PSIl.k'-l? This comparison is 
statistically the same for every k' > 1 , it just happens at different overall entropy values. 
So we only need to compare PSI0,1 with PSI1 = PSI1.0 at low entropies; at higher 
entropies, other options will be used. 

We provide this comparison in Appendix C and conclude t hat over the low 
entropy range of interest, the average performance of PSI1 is probably always slightly 
better than that of PSI0.1. Using this observation and (90), we can say that for K > 0; 
over the entropy range of the first option 

0.5 + K<H§<K + 1.5 (92) 

a PSI14.K with internal N-option PSI14.0 with X = 0 will not perform quite as well as 
one with internal N-option PSI14.0 which starts with PSI1.K-1. Performance over the 
remainder of the Dynamic Range should be identical. 

From Veh's result this is not surprising H "U. She showed that the PSI1 ,k options 
are equivalent to Huffman codes over the range of interest. 
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PSI14.K vs PSI14,0 for K > 0 

Basically, the difference in performance between any PSI14.K configuration 
and its nearest equivalent PSI14.0 is either nonexistent or too minor to exclude one 
approach or the other. The choice should be based on implementation considerations. 
Let us now discuss some specific implementation examples. 

Single Dynamic Range. Here we must design a coder which works 
efficiently over a single prescribed entropy range. If this coder is being designed from 
scratch, there is no point in including PSIO as an option since it offers no advantage 
when K > 0 (higher entropies). Under these conditions, the performance of either 
approach is identical. On the surface at least, it's a toss-up. 

However, if the design of a fixed PSI14.0 with starting option PSIO already 
exists, it is clearly simpler to turn it into a fixed PSI14.K than to redesign a new PS 1 14,0 
with a new X to meet a specific Dynamic Range goal. 

Multiple Dynamic Ranges. In this case the implemented coder must include 
an external parameter to adjust the Dynamic Range: Either K for PSI14.K or X for 
PSI14,0. Now K > 0 and X > 0 are possible all in the same coder. External control of 
these parameters would be essentially identical to a user. 

A PSI14.K has one major advantage: The adaptive part of the coder is fixed 
(i.e., the internal PSI14.0). All the code-option decision making and the assignment of 
identifiers would not change as K was adjusted. This is not true for a PSI14,0 with 
adjustable X. 

On the other hand, PSI14.0 with adjustable X maintains a slightly simpler output 
format. But overall, the edge in simplicity for this situation seems to be with PSI14.K. 

Stretch PSI14 (sPSI14) 

Another way to obtain a broad Dynamic Range, at some penalty in 
performance, is to build an adaptive coder that only uses every other standard PSI14 
code option; we denote this by sPSI14. 
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By using the notation developed in (63)-(73), we can define all the code 
options of an N-option sPSI14 by 


v i(U2id), k(U2id) 
for X > 0 and 0 < id < N - 2 


(93) 


and, as for PSI14 itself, 


^a N _i = ^3 < 94 > 

Such an sPSI14 will have a "potential" Dynamic Range given by 

X + 0.75-0. 25i(A.) < Rg< min | X+2(N-1)-0.5 ( 95 ^ 

for n-bit input data. For large N, this is roughly double the range given in (87) for a 
PSI1 4 using the same number of options. 

We say "potential" Dynamic Range because there is a performance penalty in 
skipping options. 

The consequence should be apparent from Fig. 18, which shows how the 
performance of three adjacent PSIl.k are related. These graphs are not precise and 
are only intended to make a point. Greater accuracy under various conditions is 
provided by Yeh HI]. 

Referring to Fig. 18, an sPSI14 would have options PSI1,m and PSI1,m+2 but 
not PSI1,m+1. Under ideal stationary conditions, the average performance given up 
by not including PSI1,m + 1 is indicated by the crosshatched region (adjusted slightly 
because generally one less identifier bit would be needed). Under most real 
conditions, it is speculated that this average penalty will be spread over a broader 
entropy range. An sPSI14 performance curve will look like a slightly inefficient and 
"bumpy" PSI14 performance curve over the same entropy range. Specific details must 

await simulations. 
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Fig. 18. Skipping an Option 


CURRENT VLSI IMPLEMENTATIONS 

GSFC and the University of Idaho have implemented a custom I.O-pm CMOS 
VLSI version of the Coding Module in Fig. 4, as well as a compatible "decoding" 
module [8]. The coder incorporates most of the general-purpose functionality in that 
diagram. However, the PSI14.K coder included is actually PSI14.0 with N = 12 
options, \ = 1 and J = 16. Input quantization can vary from n = 4 to n = 14. The coder 
and the decoder were recently tested successfully under laboratory conditions. The 
coder was operated at input data rates of up to 700 Mbits/s and the decoder at half that 
rate. 


JPL has used different technologies to implement two custom 1.6-p.m CMOS 
VLSI versions of the Coding Module in Fig. 4. Because of certain instrument-specific 
constraints, these modules include fewer of the more general functions of Fig. 4. The 
internal coders are in each case PSI14.0 with N = 11 options, X = 1, J = 16 and 
allowance for input quantization of n = 12 bits/sample. Both versions have been tested 
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in the laboratory at up to 180 Mbits/s [9]. An enhanced version of this coder is being 
implemented for the CRAF/Cassini project. 
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IV. OUTSIDE THE MODULE 


Chapters II and III have filled in the details of coding module PSI14.K+ shown in 
Fig. 4. This chapter briefly addresses some issues relating to the use of PSI14.K+ 
within a "compression system." 

EXTERNAL SPLIT-SAMPLES 

One interesting observation that can affect a compression system's 
implementation is that the concept of Split-Sample Modes can also be accomplished 
outside the module for many applications, as illustrated in Fig. 19. 



Fig. 19. Internal vs External Splits 


In this illustration: 

• X is a "raw" n-bit data sequence (e.g., imaging or magnetometer data). 

• The indicated PSI14.K+ uses the built-in preprocessor directly or with an 
external predictor. 
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The upper coding process, within the dashed lines, first performs an i-bit 
split directly on X. By using the Split-Sample coding structure shown in Fig. 12, the 
sequence of most-significant n-i bit samples of X (i.e., M^’ ) are coded by PSI14.K+, 
and the least-significant i-bit samples (i.e., L^) are passed on unchanged. 

In the lower coding process, all X is passed directly to a PSI14,K+ module 
where the parameter K has been increased by i to K' = K + i . 

The message from this figure is that the upper and lower coding processes 
perform almost identically and exhibit the same Dynamic Range. 12 Computer 
simulations indicate that performing Split-Sample operations all with PSI14.K+ (the 
lower path) gives a slight edge in performance of less than .05 bit/sample l 5 J. 

Some observations: 

• The effective Dynamic Range of a given PSI14,K+ module can be shifted 
upward externally to the module (e.g., to entropy values not supported by 
the largest value of K or X). The Galileo image compressor uses this 
approach to shift the performance for a complete image line t 13 !. 

• A PSI14.K+ module that can only accommodate data quantized up to n 
bits/sample might be usable on n+i bit data if the upward i bit/sample 
shift in Dynamic Range is acceptable. For example, a PSI14,K coder with 
a Dynamic Range 2.5 < H5 < 8.5, which was originally designed to work 
only on 12 bits/sample data, could now be used on 14 bits/sample data, 
but with a new Dynamic Range of 4.5 < H5 < 10.5. 

• Another advantage is provided in systems employing Priority Driven Rate 
Control, as discussed in a later section. 


12This would also be true for PSI14.0 with starting option parameter \ increased by i . 
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DECODING PREPROCESSED DATA 


PSI14.K+ allows for input data to be passed directly to the adaptive variable- 
length coding section, PSI14.K without preprocessing. An input block 8 n is coded as 
PSI14,K[S n ], Given this coded sequence, 5 n can be reconstructed precisely; no other 
information is required. This is not necessarily the case when preprocessing is 
involved. 

Let us review the preprocessing and coding process at the sample level. The 
error between a sample xj and its predicted value Xj produces the error value 

Aj = xj - xj (96) 


which is converted to 


5 i (97) 

by the reversible process in (17), or by (18) and (19). Blocks of 5j are then coded. 
During reconstruction, each Aj can clearly be retrieved by reversing these steps. Then 
the original sample xj can be reconstructed as 

xj = Xj + Aj (98) 

Thus, reconstruction of any individual sample can be accomplished, provided that its 
original prediction is known or can be recomputed from available information. 

To see how this fits various applications, we must define some terms to handle 
sequences of blocks. Let 


X = Xt * X 2 * • . . * X m (99) 

denote a sequence of blocks, where the individual Xj are known a priori to be Jj 
samples long, so that 

Xj = xjj x i)2 . . . xjjj (100) 
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The corresponding vector of prediction values used during the coding process is then 


AAA 


x i = Xj,1 Xi,2-Xi,J 


( 101 ) 


and the resulting error sequence is then 


Aj = Ajj Aj 2 • • • Aj jj 


( 102 ) 


where 


Aj.i =Xj j -x i( | 


(103) 


The equivalent prediction and error sequences for multiple blocks are then 


AAA A 

x = x 1 * x 2 * . . . x m 


(104) 


and 


A = A-| * A 2 * . . . A m (105) 

^ A 

We can then say that all X can be recovered precisely from A, provided that X is 
known or can be computed. 

Thus, the coding of each Xj by PSI14.K+ is really a function of the prediction 
used, as 


A 

PSI14.K + [Xj, Xj] 


(106) 


rather than the short form 


PSI14,K+[Xj] (107) 

which we have been using (and will continue to use); (106) merely emphasizes what 
we have implicitly assumed in (107). 
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One-Dimensional Predictor 

For this case we assume that the X in (99) really represents a partitioning of a 
single long sequence of sampled data from a single source into several smaller 
blocks. That is, the first sample of X, follows the last sample of block Xj_i . All the 
samples are adjacent in time. For example, X could represent a single image line. 

Using the built-in predictor of PSI14,K+ means predicting that the next sample 
is the same as the last, so that for block Xj 

xj.l =Xj > i_i fori >2 (108) 

and 

*i,i = x i-1,Jj_-| fori =1 (109) 

This means that the prediction is always known, provided that a "last sample" 

exists. This is true except for the first sample of the first block x-| j . Prediction for the 

first sample must be supplied to "initialize" both a PSI14,K+ coder and decoder. The 

coder must have x-j i to generate the first error, Ai i, and a decoder needs it to 

’ A ’ * 1 

recompute the first sample x-| j = x-| j + A-| j . This initial prediction value is usually 
called a Reference Sample. 

The overall coding of X can be described by 

V 15.KW = REF * <4,Kt*l] * <4,K[*2] * * v| 4 K [X m ] (110) 

where it is presumed that the PSI14.K+ module is initialized by "REF." 
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In practice, REF can be obtained in several ways: 

a) If REF is known, it can be omitted altogether. 

b) If REF was set to some arbitrary known constant, it could also be omitted. 
However, the initial A-) i could be quite large, causing the wrong code option to be 
used for the remainder of the first block, X-) . This could be expensive in bits. 

c) REF could be set to the first sample in X-j , causing the initial difference to 
be a zero. Keep in mind that the contribution of a zero to any Fundamental Sequence 
is only 1 bit (see 27). 

d) The first block can be split into two parts: A Reference Sample and a new 
j 1 _ i sample block X 1 that begins with the second sample of Xi , as illustrated in Fig. 

20 . 
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Fig. 20. Reference Sample Extraction 

This approach can be better than (c) by only 1 bit, but there seems to be a 
hardware advantage [8]. Consequently, the University of Idaho VLSI team has 
incorporated the capability to extract a Reference Sample in this way into their current 
VLSI module implementation. 
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External Predictor 


The PSI14.K+ module allows for input of an external prediction on a sample-by- 
sample basis. We will consider a few practical applications here. First, note that the 
form of coding specified in (109) does not really depend on the type of predictor used. 
Of course, the actual prediction must be known for each sample by both coder and 
decoder. PSI15.K+ defines the form for coding a sequence of blocks using PSI14,K+, 
regardless of the prediction method used. The REF sample may or may not actually be 
present. 

Two-dimensional (2-D) Arrays. Fig. 6 introduced the concept of using 2-D 
prediction for image applications. We can now be more specific about formats and 
initialization problems. 

To aid in the discussion we again extend our notation to include a parameter to 
specify the jth line of a 2-D array. Then 

{X, Xj, Xj, . . .} — » (X(j), Xj(j), Xj(j), . . . } (111) 

where now X(j) represents the jth line, etc. For simplicity we presume that the length of 
any block in one line is the same as the corresponding block in another line, as 
illustrated in Fig. 21. 


61 


X(j-l) 

'"V 

Xlj) 


BLOCK 1, 

BLOCK i 

i i 

i 1 1 

, 

: 

i 

i i 

i 'ii 1 

; , 1 1 x M ( j" n 

ii- 

1 

1 1 

A 

h 1 i 

a — 

1 

I 

1 

'S\ 

xu-i^ ; 

i Uh sample, 

of ifh block, 

1 ' 

1 1 

1 

1 

i 

i 

i 

i 

in jth line 

1 

i 


LINE j-1 
LINE j 


Fig. 21. Two-Dimensional Array 


Using Fig. 6, we have the equation for a 2-D prediction of the -2th sample in the ith 
block of line j, as 13 

a ... xu_i(j) + x ii i(j-1) 

Xii(j)=“ l 2 1 j 

where Xj _-| (j) is interpreted to be the last sample of the preceding block, xj_i j (j). This 
is a legitimate prediction, provided that each term exists. Unfortunately, such is not the 
case for all i,j,i. We need to investigate how the predictions should be modified to 
ensure the reconstruction of Xj f j(j) as 

Xj,4(j) = xj,i(j) + ( 113 ) 


1 3jhere are other possible versions which could be substituted for Eq. 112. In most 
cases they are unlikely to alter performance significantly. 
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Consider the most common arrangement for coding and decoding: Line j 
follows Line j-1 , and each line is coded from left to right. 

When X(1 ) is coded, there is no previous line, so the term Xj ^(j-1) is not valid. 
Thus, X(1) should be coded with a one-dimensional (1-D) predictor, as described in 
the previous section. 

On subsequent lines, only the first term of the first block is missing, x-j -j(j) in 
(112). This could be handled in a number of ways. From a performance point of view, 
the best method is to employ a 1-D prediction by using the first sample from the line 
above, as 

*1,1 G) = *1,1 (j“1) ( 114 ) 

This is generally as good as a 1-D prediction in the same line. 

Fig. 22 reviews the prediction strategy described above. 



Fig. 22. 2-D Prediction Regions, Common Format: 
Top-to-Bottom, Left-to-Right Coding 


63 




The array coding format for the jth line takes the form from (1 10) 

<5, K [*(j)] = REF * < 4 ,K^1 (i)J ‘ * < 4 ,K[*m(j)] (115) 

If the prediction of (1 1 4) is used, then REF will be present only for j = 1 . In either 
case, the coding of the complete array takes the form 

<5,kW 1 )1-<5.kW2)]- - < 116 > 

that is, coded line 1 followed by coded line 2, etc. 

Sensor Noise. Some scientific instruments generate a two-dimensional 
image by "sweeping” a single line of individual one-picture element sensors across a 
target. The sweeping action is usually supplied by spacecraft motion. For example, the 
High-Resolution Imaging Spectrometer (HI R IS) instrument P 4 ] will simultaneously 
sweep 192 such line sensors, each representing different spectral bands (and 
producing separate two-dimensional images). The direction along the swept line is 
usually called "Cross-track" and the direction being swept is called "Down-track." 
Figure 23 is provided to help keep these distinctions in mind. 


64 




The figure shows a single line of sensor elements (running cross-track) from 
each spectral band. At time t-) , each line of sensor elements is exposed to a 
corresponding line of reflections from earth, which creates a single digital line of a 
familiar image (see below). 

The composite of all such lines is called a "frame." At time t2, the spacecraft has 
moved the "view" of each line sensor to the next "Down-track" position, which creates 
a second frame. At time t3, the view has again moved, and so on. The composite of the 
digital lines created by each swept spectral line forms a two-dimensional image, as 
illustrated for the first spectral band. 

Unfortunately, the individual uncalibrated sensor elements of today's 
instruments tend to have different gain and offset characteristics. This means that each 
sensor element will not necessarily generate the same output for a given input. The 
result appears as an additional randomness (which we can call SENSOR NOISE) 
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when trying to predict the values of one sensor element by using the known values of 
another along this line (cross-track). That is, for example, the use of the built-in 
predictor of a PSI14.K+ module. The consequence of this additional randomness is an 
increase in prediction entropy, and hence code rate. The potential impact on code rate 
caused by this Sensor Noise is discussed in Appendix D. Although there are really no 
hard numbers available at this time, the potential increase to Cross-track prediction 
entropy for some of the latest high-performance instruments could be as high as 1 
bit/sample. 

But note that a prediction based on the same sample in the previous line (e.g., 
x-| j(j) in Eq. 114) would not be affected since both samples originate from the same 
sensor element, as a result of sweeping the line of sensor elements (see Fig. 23). 
Then, by assuming symmetric data characteristics, a lower prediction entropy would 
be achieved by predicting using only the "previously generated" line. By extending Eq. 
1 14 to all the samples of a line, we have 

Xi,lG) = Xj (i (j-1) (117) 

Note that using this kind of "Down-track" predictor does not mean that a large multi- 
line buffer is necessary. Only the previous line is needed to produce the predictions in 
(117). The coding still takes the form in (116). Only the predictor has changed, as 
illustrated in Fig. 24. 
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Fig. 24. 1-D Down-Track Prediction Strategy 
for an Uncorrected Swept-Line Sensor 

A similar look at a two-dimensional predictor reveals that it would be affected 
less than a left-to-right 1-D prediction because only one of the terms in (1 12) has noise 
riding on it. But typically the most one can expect to gain from a two-dimensional 
prediction is about 0.5 bit/sample reduction in entropy. This advantage might easily be 
canceled by the above-mentioned noise effects for some current modern instruments. 


More General 2-D. It should be noted that the presumed line-by-line, left-to- 
right ordering in the examples described above is really a convention, not a constraint. 
The real constraint is spelled out in (96) to (98): The coder and decoder must 
ultimately use the same prediction for each sample that is coded and decoded. There 
is no reason that lines could not be coded in opposite directions, or in any order, 
provided that appropriate modifications are made to the prediction strategies to assure 
compatibility between coder and decoder. In fact, we will need to take some liberty in 
such assumptions in later discussions to avoid getting bogged down in notation. 
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In particular, to aid discussion of higher level issues, we will use 

PSI2D (118) 

to denote a general purpose 2-D array coder, of which Figs. (22) and (23) and Eqs. 
115 and 116 provide good examples. When we specify PSI2D, we will generally 
ignore coding details and focus on prediction strategies and the relationship of one 
array to another. 

Extreme Reordering Example. An extreme example of reordering the 
coding and communication of a 2-D array is shown in Fig. 25. Here it is presumed that: 

1) A large 2-D array, A, is partitioned into 16 equal-size sub-arrays, 

A-|, A2, . . . A-|g. ( 119 ) 

2) All the data (each Aj) are generated in a buffer before coding 

begins. (120) 

3) A prescribed order of coding and communications of the Aj is 

A7, A2, A5, A3, Ag, A15, 

% Ai 3, A-| o^A-j 1 ,^A-| , A9, 

A14.A4.A12.Ai6 ( 12 1 ) 
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SPATIAL ORDER OF CODING 





OCR] » Op ... o j, g 

= 7 , 2 , 5 , 3 ,. .., 4 , 12,16 

Fig. 25. Example: Changing the Order of 
Coding 2-D Sub-Arrays 

More generally, we see that (121) is a special case for an ordering specified by 

0(A) = Oi O 2 . . . 0-|6 (122) 

where Oj = array number for the jth coded sub-array. For the specific example in (121) 
and Fig. 25 

0(A) = 7, 2, 5, 3, 6, 15,8,13, (123) 

10, 11, 1,9, 14, 4, 12, 16 

Clearly, the coding of each A, can take the form in (115) and (116), or more 
generally PSI2D in (1 18). The only significant unanswered questions are, How are the 
prediction of A, samples specified? and Do we need to communicate additional 
information on the transmission ordering of the Aj? 


69 




Case I: The Aj are independently coded (no information from surrounding sub-arrays 
is used), and the ordering of sub-array transmission is known a priori at the coder and 
decoder. 

No other information is needed, so the coding of A will appear as 

PSI2D[A 7 ] * PSI2D[A 2 ] * PSI2D[A 5 ] * . . . (124) 

for the example specified in (1 1 9)— (1 21 ). 

Case II is the same as Case I, except the ordering of sub-array transmission (coding) 
is unknown a priori to the decoder. Then the coding of A can be described in general 
by the format 

0(A) * PSI2D[A 0l ] * PSI2 D[Aq 2 ] * . . . (125) 

or 


Oi * PSI2D[A 0l ] * 0 2 * PSI2 D[Aq 2 ] * . . . (126) 

where the order numbers (see Eqs. 122, 123), would, of course, be converted to 
binary numbers for use in (125) and (126). 

For the specific ordering in (121), (122), and Fig. 25, (126) becomes 


7 * PSI2D[A 7 ] * 2 * PSI2D[A 2 ] * . . . (127) 

Case III is the same as Case II, except that the predicted boundary samples of any Aj 
can make use of adjacent samples from surrounding Aj. 

The additional rule to follow is simply that the boundary samples from adjacent 
sub-arrays can be used in a prediction IF that sub-array has already been coded and 
transmitted (so a decoder would be able to use the same data). To illustrate, Fig. 25 
exhibits arrows pointing across the boundaries between sub-arrays. Arrows from A^ 
into Aj mean that the corresponding adjacent samples of A^ can be used to improve 
the prediction of Aj. In the Fig. 25 example, the top row of A q can be used to aid the 
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prediction of the bottom row of Aiq- because A g is transmitted before Aiq- In (117), 
x i,i (j) would become Xj j (j) = xj j (j+1 ) instead. 

Of course we are ignoring details, but it is important to note that the algorithm 
which specifies how (and if) the coding of a particular Aj will use the samples from (the 
boundaries of) adjacent sub-arrays can be determined solely by the order of 
transmission. No other information is necessary to tell a decoder what the coder did. 

Whether the additional coding gains obtained by improved prediction on the 
sub-array boundaries are worth the additional complexity is another question 
altogether. 

An Archive. The manner in which an array A is used can produce other 
constraints which may override the quest for maximum coding efficiency. While the 
arrangement suggested in Case III above might provide a slight improvement in 
efficiency over Case I and II (a lot depends on the size of the Aj), full knowledge of all 
A Oj- • < j is required to decode AQj. In the worst case, all A must be decoded in order 
to decode the last Aj (A-|g for the specific order in (123)). 

In an image archive, A might represent a complete image, and the Aj might 
represent spatial subimages (with typically many more than 16 subdivisions). In this 
case it is clearly desirable to be able to directly access and decode selected Aj without 
the need to decode a complete image. 

All A could be represented as in Case I and (124), repeated here for 
convenience, and now assuming a standard ordering A-| , A 2 , A3 a 16 . 

| PS**]- PSI2D[A 2 ] j PSI2D[A 3 ] * . . . (128) 

START START START 

OF IMAGE OFA 2 OFA 3 

Now, each PSI2D[Aj] is variable length and embedded within the longer coded 
string of (128). The problem is that a decoder can only extract Aj without the need to 
decode those Aj in front of it, provided that it knows where PSI2D[Aj] begins (in the 
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overall coded string of binary data in (128)). That is, without this specific knowledge, 
A-j, A 2 , Aj_i would have to be decoded first. 

So, let 14 


ij = jS(PSI2D[Aj]) 


(129) 


and 


L(A) = i 1 


Then, A can be coded and stored serially as 


L( A) * PSI2D[A-| ] * PSI2D[A 2 ] * PSI2D[A 3 ]^ 


(130) 


(131) 


or 



PSI2D[A 3 ] 


(132) 


In (131), L(A) is maintained separately as a table of pointers to the relative 
positions in memory where each Aj begins. In (132), each t\ is used to jump over each 
coded Aj until the desired one is reached. 15 Any individual Aj can now be quickly 
found and decoded without the need to decode any other Aj. Keep in mind that this 
example illustrates an approach that speeds the determination of the location of a 
variable-length string in a serial data stream. For the archive problem, the Aj could just 
as easily be a one- or three-dimensional array. Additionally, exactly the same problem 
results when communication errors threaten to disrupt the decoding process. 


14 This i \ has no relationship to the ^ j in Eq. 28. 


I^Note that in both cases, the -?j would typically be reduced to word, or byte, pointers 
rather than bit pointers. 
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ID CODING 


Consider again the coding of a sequence of data blocks, this time from a more 
general point of view. We'll get more specific later. As before, we take 

X = X 1 *X 2 *. . .X m (133) 

as the sequence of m blocks to be coded. Each block is to be coded by an N-option 
adaptive PSI1 1 coder, as defined in (21)-(24). We need not be too specific here. 

The N code options for this PSI1 1 are given as 

PSIccq, PSIcc-i , . . . PSIotfsi -| (1 34) 


so that the form of a coded Xj is 


VnfXj] = ID(idj) * ¥ ajd .[Xj] 


(135) 


where 


0 < idj < N-1 (136) 

is the code identifier for block Xj and 

ID(idj) (137) 

is its standard [log 2 NJ bit binary representation. 

Now, define 


id = id-| * id 2 * . . . id m 


(138) 


as a block of all the identifiers and 
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ID(id) = ID(id-) ) * ID(id 2 ) * . . . ID(id m ) 


(139) 


as the corresponding sequence of standard binary representations. 

Rearranging the Coding of X 

The basic form for coding all X is simply 16 

Vn[Xi]*Y 11 [X 2 l*...V 11 [X n d (140) 


or, expanding from (135), 

lD(id-| ) * V ajdl [Xi ] * ID(id 2 ) * ^a id2 [X2] * • • • ID (^m) * v a idm [Xm] (141) 

But this can be rearranged without jeopardizing the decoding process, and without 
any difference in performance as 

ID(id)*V ajdl [X 1 ]*'K ajd2 [X2|*... (142) 

What have we gained? Thus far, we have only incurred an additional buffering 
requirement. However, it should now be clear that ID(id) represents the fixed-length 
coding of a sequence of symbols from a new data source — "the code identifiers. If the 
entropy of this new source is less than a fixed-length representation requires, there 
may be room for improvement. This suggests the coding structure shown in Fig. 26, 
where we have used SPLIT’ to indicate the operation of separating a coded block's 
idj and PSIcq d j[Xj]. We have named this structure PSI16 so that 17 


I^From previous discussions, additional header information might be required (e.g., 
REF, 0(A), etc.), depending on the actual coding process. 

17 Again, note that, for the sake of simplicity, we have left out the possible additional 
header information that may be required, such as REF, O(A)^, etc. In this case, 
information could be required for both the original data X and the id. 
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Fig. 26. PSI16, Coding the Identifiers 


Vl 6 [X] = V a [id] * Voj^IX!] * . . . V ajdm [X m ] (143) 

where PSIa denotes the "to-be-determined" coder to be used to represent the m- 
sample identifier sequence, id. In the basic representation of (140), PSIa = ID[ ], 

The Code Rate 

Let 

m 

A = 'Y/. (¥ aid [Xj]) bits (144) 

i=1 
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and 


B = <£ (PSIa[id]) bits (145) 

Then, the average code rate for PSI16 is 

.. at - liggislM . m. + m bits / sample d46) 

£(X) £(k) £(X) 

Without loss of generality we can assume that for each block, both block size 
and number of code options used for each block are the same and set to J and N, 
respectively. Then we have 


£ (X) = mJ = fixed (147) 

Now, in considering what happens as N and J are varied, we will assume that 
m (the number of blocks in X) is adjusted to maintain the length of X at a fixed value. 

Contributions From (A). The statistical term (denominator fixed) 

IfAl (148) 

mJ 

is a function of N, J, and the characteristics of the data. Clearly, we can say that 

1 ) Larger N means that there are more code options to choose from over a 
given block size J. Thus, this term would tend to decrease with 
increasing N. Of course, real improvements will only occur if the added 
options are actually used (i.e., that they are chosen to be used because 
they provide better performance than other options). 

2) As J is decreased, the decision to use one of N code options is made 
more often. The coder is thus able to respond more rapidly to changes in 
data entropy and can thus code more efficiently. Since the penalty of 
added code identifier bits is not included in E{A}, this term will tend to 
decrease with decreasing J. 
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We cannot be more precise in evaluating E{A} without getting considerably more 
specific. 


Code Identifiers. Now, consider the second term, which corresponds to the 
contribution from code identifiers 


m. 

mJ 


(149) 


If we assume a standard fixed-length binary representation, as in (140), we have 


B = E{B} = mllog 2 Nj bits 
so that (147) is reduced to the nonstatistical 

llog 2 NJ 

ap(N,J) = — j — bits/sample ( 1 50) 


a F is plotted in Fig. 27 for various N and J of interest. Because of a potential reduction 
in the code-rate penalty implied by ocp, the structure in Fig. 26 was devised. 

The total code rate in (146) becomes 

*F(N,J) = + a F (N,J) (151) 

£(X) 

Note that ap increases with N and decreases with J, just the opposite of the 
tendencies exhibited by the first term in (146) and (151). To investigate this any 
further we must be more specific. 
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Fig. 27. Code Identifier Code Rate, 
Fixed-Length Representation 


Application to PSI14.K+ 

Now, let the primary block coder in Fig. 26 be PSI14,K+. 

Fixed-Length Identifiers. For most practical applications (e.g., imaging), 
simulations will show that * P (N.J) in Eq. 151 behaves something like the graph in 
Fig. 28. That is, from an overall performance point of view, the choice of block size J is 
not critical, provided that it is not too small and not too large. Since there is generally 
an implementation advantage to smaller block sizes, most applications to date have 
used a convenient J = 16 (power of 2) near the lower end of this range of '•best- 
performance. Thus, in many situations, variations in the two terms in Eq. 151 cancel 

each other out. 
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J- 1 1 h— I f 1 -I 1 1 1- 

0 8 16 24 32 40 

BLOCK SIZE, samples 


Fig. 28. ^Kp(N.J) for a PSI1 4,K+ Primary Coder 

Coding the identifiers. Now, fix the PSI14.K+ structure in Fig. 26 and 
consider situations where the coding of identifiers might be beneficial and for which 
we can identify PSIa. 


For illustration, let 


m = 1 6 
J = 16 


(152) 


(a realistic assumption). This means that each X in (131) is a sequence of 16 blocks of 
16 samples each. Additionally, assume that the number of available code options is 


9 < N < 1 6 (153) 

By Fig. 27 or Eq. 150, the penalty for a fixed-length code identifier representation is 
0.25 bit/sample (or 4 bits per identifier). For our example, this is the most we can 
expect to reduce the overall code rate by prescribing a different PSIa. 

The desired form for PSIa in Eq. 143 and Fig. 26 becomes readily apparent by 
drawing an analogy from the plots of entropy in Figs. 29 and 30. 
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The figure shows a hypothetical, but not unrealistic, graph of average source entropy 
plotted for a series of J = 16 sample blocks. The slowly varying entropy function is 
shown for 1 ,000 such blocks in Fig. 29, and then Fig. 30 expands the region for blocks 
400-500. 

Note that these graphs look much like the waveforms generated by typical data 
sources to which PSI14.K+ would generally apply. In fact, this is the key to specifying 
an appropriate practical form for PSIa. 

Recall how PSI14K+ codes a block of sampled data. It compares the 
performance of a set of individual code options (repeated here from (73) where, 
without loss of generality we'll assume K = 0, since we're focusing on codes with many 
options): 


I'ifl.+id), k(Uid) 


(154) 


for X > 0 and 0 < id < N - 2 and 


¥ 3 for id = N - 1 (155) 

(where X sets the starting option and id is the code identifier for a given option) and 
chooses to use the best one, say with code identifier 18 


id = id+ (156) 

It is now important to recall the performance characteristics for each option as 
the identifier number is increased. Except for the first option and those with very large 
id values, each individual option exhibits a Dynamic Range of efficient performance of 
about 1 bit/sample. These 1-bit Dynamic Ranges are both adjacent and non- 
overlapping M 1], Thus, the Dynamic Range for the option with id = -? sits 1 bit below 
the Dynamic Range of the option corresponding to id = l +1. This is shown more 


^For example, with X = 1 and N = 16 the codes in order of increasing id are PSI1, 
PSI1 ,1 , PSI1 ,2 . . . PSI1 ,15, PSI3 for a 1 6-option coder, and corresponding to PSIss in 
Ref [7], 
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explicitly in Fig. 30 where, with X = 1 in (154), the code identifiers for code options 
PSI1 ,id+1 are shown adjacent to their corresponding Dynamic Ranges. 

Code option PSI1,id+1 will be chosen most frequently when the source entropy 
lies within its 1-bit Dynamic Range (id+1.5 < H5 < id+2.5). For example, in the graph of 
Fig. 30, the identifier would tend to switch between id = 3 and id = 4 over blocks 400- 
425. Then, id = 4 would predominate for the next 25 blocks as average entropy rises. 
And so on. 

Thus, we can interpret the generation of code identifiers as a "sampling" of the 
entropy waveform with a quantization accuracy of 1 bit of entropy per quantization 
interval (i.e., each code option). Hence, the data source represented by a sequence of 
code identifiers behaves much like the data sources that PSI14.K+ itself was designed 
to code efficiently. We should be able to follow a similar approach here. 

In fact, for the example in (152) and (153) we should be able to use a properly 
configured PSI14.K+. 

Recall that the "+" in "PSI14,K+" refers to those steps that precede the actual 
coding of 5 n sequences by adaptive variable-length coding. In general, those steps 
could be arbitrary. In fact, the definition of a PSI14.K+ "module" presumes that pre- 
processing can be external and, therefore, arbitrary. However, we can be more 
specific for the identifier coding problem. The same kind of predictive preprocessing 
considerations apply to the coding of identifiers as to more typical forms of data. For 
now it suffices to note that the simple PSI14.K+ built-in one-dimensional predictor (or 
external predictor) and mapper from Chapter II directly apply here. We will return to 
this issue momentarily. 

Then, assuming that a preprocessor has been specified, let 19 

PSIa = PSI14.0+ 

with parameters n a , J a , X a , and N a given as 


19 Orthe original PSI4 would also cover the required range of "id entropy." 
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n a = Number of input quantization bits = [log 2 NJ = 4 

J a = Block length = m = 16 

A. a = Starting Option Parameter for PSIO = 0 

N a = Number of options = 4 

Thus, the code options are from (154) to (155) 

PSIO, PSI1 , PSI1 ,1 , and PSI3 (1 57) 


That is, more simply, 


PSIa s PSI14+ with X = 0 (158) 

The source of identifiers that we are now coding takes on values 

id = 0, 1 , 2, ... ,14, 15 (159) 

with the value of one sample distributed about the value of an adjacent sample (with 
adjustments near the boundary values of 0 and N-1). We have seen this before. The 
one-dimensional predictor in (14) and the specialized mapping of ( 1 8)— ( 1 9) directly 
apply without modification. 

The typical performance for the 4-option coder should adequately cover the 1-4 
bit entropy range, as shown in Fig. 16 (although the latter graph was obtained from 
data with a much larger number of quantization levels). If much lower identifier 
entropies can be anticipated, other algorithms could be applied (e.g., PSI9 in Ref. 3, or 
run-length coding). For this example, the coder for PSI16 in Fig. 26 reduces to that 
shown in Fig. 31 . 
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Fig. 31. RS1 1 6 Coder of Identifiers Example 


As already noted, the difference in performance between this coder and that 
provided by a standard PSI14.K+ representation is totally in the penalty imposed by 
the identifier bits. For the fixed-length representation, this penalty is ap(N,J) in Eq. 150 
and Fig. 27, or 0.25 bit/sample with N > 8 and J = 16, as in this example. 

Under the worst-case conditions, the PSI14,0 coding of id in Fig. 31 doesn't 
work at all, so that PSI3 is used on all identifier blocks. But PSI3 is equivalent to the 
fixed-length representation. Thus, the total penalty for identifier representation is at 
most ap(N,J) plus the penalty to specify which code option was used to represent the 
identifier blocks themselves. 

Assuming the 4-option PS1 1 4,0 in Fig. 31, this penalty cannot be larger than 2 
bits per X sequence (from which a block of identifiers is extracted). This computes to 
1/128 < 0.01 bit/sample. 
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But note that actual conditions may be quite different from these worst-case 
conditions. If the entropy is gradually changing, as in the hypothesized plots of Figs. 
29 and 30, the identifier predictions would produce small errors so that code rates 
between 1 and 2 bits/identifier might be a reasonable expectation. This would reduce 
the identifier penalty (and hence the overall average code rate) to 0.0625 to 0.125 
bit/sample. In the limiting case (all PSIO occurrences - same code used for all primary 
blocks), the identifier cost would drop to 0.03 bit/sample. Overall, this says that such 
identifier coding might pick up as much as 0.2 bit/sample when stationary 
conditions prevail, but would not lose anything significant when data 
characteristics are rapidly changing. 

These gains would double if a primary block size of J = 8 were used so that the 
fixed identifier penalty for an N > 8 option PSI14,K+ is 0.5 bit/sample. Thus, identifier 
coding might be most appropriate for situations in which the source is sometimes 
rapidly changing (where smaller block sizes can be advantageous) and at other times 
quite stationary. 

How General? It should be clear that the primary and secondary coding steps 
in Figs. 26 and 31 are independent of each other. That is, a primary PSI11 (or 
PSI14.K+) coder that operates on an input Xj does not need to know about the later 
coding of identifiers by PSIa (or the reduced PSI14) and vice versa. Thus, for example, 
identifier coding can be done external to a VLSI module that performs the primary 
coding. 

Because of this independence, one can look at the identifier coding problem as 
more than a one-dimensional problem, as we did earlier for the primary data stream. 
For example, the identifiers can be viewed as a two-dimensional array, just as in Figs. 
2, 22 and 24, where two-dimensional predictors were considered. Certainly for 
imaging applications, a prediction of the identifier (entropy) used when coding a 
particular block could be positively influenced by knowledge of the identifier (entropy) 
used by the block above it. In fact, the block above offers more pertinent prediction 
information than an adjacent block along the same line (the block above is located in 
the same spatial region). 

Iterative PSI16. While not of practical value for the real data sources normally 
encountered, it is worthwhile to note that PSIa in Fig. 26 could be interpreted to be 
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another PSI16 (with different parameters). By doing so, it means that the identifiers of 
the coded primary identifiers would be coded also. And so on. A similar iterative 
structure results from the binary coding algorithms in Ref. 3. 

RATE CONTROL 

Not all applications have the luxury of a data system that can absorb the 
naturally occurring variations in output rate from a noiseless coder. Often there is a 
physical constraint that sets a fixed number of bits that can be used to represent blocks 
of data. This may come in the form of memory restrictions and/or transfer-rate 
limitations. 

The case of primary interest here is one in which the number of bits available to 
represent data blocks is close to what is necessary to represent those blocks efficiently 
(close to the entropy) by using noiseless coding. If the number of bits available is 
close, but not sufficient, some loss in data fidelity must occur. But this loss should be 
minor since almost enough bits are available (for a perfect reconstruction). The 
resulting data reconstruction should be imperfect but "near-noiseless." 

In this section, we will first look at typical rate or buffer restrictions placed on 
imaging science applications. We will then introduce simple modifications to the 
coding process that should provide a practical measure of control on how this near- 
noiseless error will occur. We later expand this approach to the "rate control" of the 
multispectral HIRIS instrument. 

The FIFO Buffer Constraint 

The general rate-constraint problem we are addressing is ultimately one in 
which a fixed, and presumably limited, number of bits can be used to represent a 
certain span of data. Such a constraint can arise from many factors but often 
materializes in the form of a buffer that cannot hold all the data generated for the span 
of data. We will tie much of our future discussion to, and obtain motivation from, the 
constraints imposed by the assumption of "First-In First-Out" (FIFO) buffers, as 
described below. 
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The concept of a FIFO Buffer as the vehicle for a rate constraint is shown in Fig. 
32. 



• ADD DUMMY ZEROES IF DOESN'T FILL 
Fig. 32. The FIFO Buffer 

Here, a data sequence Y (which could be an image line, a single data block, a 
complete image, etc.) is coded by a noiseless coder 

PSI? (160) 

which we have intentionally left arbitrary since the details of coding are immaterial at 
this point. The coded result PSI?[Y] is passed into a FIFO Buffer of length ip bits. 
Then, the basic assumption is: 

Exactly i p bits will be communicated for Y (161) 


This means that: 

1) If jE(PSI?[Y]) > i F , all bits of PSI?[Y] after the first i F will be 

truncated. (162) 

2) If £ (PSI?[Y]) < i p, dummy bits will be concatenated to the end of 

PSI?[Y] to make the total length equal to i p. (1 63) 

Standard Image Format. Recall the familiar standard format for the 
encoding of images illustrated in Fig. 33. Lines are coded top-to-bottom and left-to- 
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right. Recall that earlier notation - X(j) represents the jth line of a T-line image, with 
samples running left-to-right. 



Fig. 33. Standard Format Line Coding and Buffering 


Lines are coded one at a time, again by using arbitrary noiseless coder 

PSI? 

as in Fig. 32 and (160). 

The coded result, PSI?[X(j)] for the jth line, is shown being placed in a FIFO 
Buffer of length 

i F = pL bits (164) 

where 

L = £ (X(j)) bits (165) 

is the length of an uncoded line, and 

0<p<1+e (166) 
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Of course, the actual lengths of these coded lines will vary. If p = 1 + e, the 
buffer can hold any line produced, including an uncoded line (we assume that PSI? 
incorporates a PSI3 option and that the e is very small, but large enough to cover any 
identifier overhead). 

By assumption, in (161 )— (1 63), the buffer length defines the rate constraint on a 
"line." That is, the line must be communicated with precisely pL bits, regardless of the 
number required or used by PSI?. 20 


Lines that use less than this quantity must be supplemented with dummy 
zeroes, as in 


i F = pL BITS 


PSI?[X(j)] * 0 0 0 ... 0 0 0 


(167) 


CODED LINE DUMMY ZEROES 
(SHORTER THAN BUFFER) 


The penalty here is, of course, efficiency. 


Longest Line. If p is decreased (shortening the buffer), eventually it will 
decrease the buffer size to the length of the longest coded line. 

Let 


M-max (168) 

be the value when this happens. Then, at this point, all lines can still be communicated 
error-free. If l p is smaller than Pmax^. the ends of some lines must be truncated to 
meet this constraint (see Fig. 32). 


20 Such a constraint could arise in many ways, but its effect is easiest to visualize with 
this buffer. 
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Average line. The average line length generated by PSI? for all lines in an 
image is given as 


(PSI?[X(j)]) bits (169) 

j 

If PSI? incorporates PSI14.K+ we know that A would typically be close to a differential 
entropy measurement times the number of samples in a line. 

Suppose that instead of a single line buffer, we used a full-image FIFO buffer to 
store all coded lines before communication. Then TA gives the minimum-length buffer 
that would allow error-free coding (see 161—163). But if we apply this same rate 
constraint over a single line, some lines will be truncated since, in general, 

A<PmaxL (1?0) 

Thus, we immediately see the general advantage of larger buffers. 

Visualizing Truncation. To visualize what happens, let the buffer size be the 
average line length, A. Coded line lengths of a single image will tend to be distributed 
about A, although the shape may not be as perfect as illustrated in Fig. 34. But this 
depiction will suffice here. Lines that are longer than A will be truncated by an amount 

j?(PSI?[X(j)])- Abits (171) 

and shorter lines will be increased by the addition of 

A-i!(PSI?[X(j)]) (172) 

dummy zeroes as in (167). 

The bit truncation that occurs will ultimately truncate samples at the ends of 
lines, as shown in Fig. 35. 
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Actual image examples of this phenomenon can be found in Ref. 5. Note that 
because the characteristics of adjacent lines tend to be correlated, so too is the 
magnitude of truncation that occurs from line to line. 

Pr 



= Buffer Length = 1 F 


Fig. 34. Example: Line-Length Distributions for 
Single Image, Buffer Length = Average Line Length 



Truncated 
Line Ends 
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ii Lilli I III 


Buffer Size and Error.The previous example was only intended to indicate 
approximately how truncation errors might occur. In reality, depending on the 
application, A and the distribution of line lengths about it are often not well known and 
may vary significantly from image to image as different types of scenes are 

encountered. 

As an example, the HIRIS is a new experimental instrument with image-like 
characteristics that has not actually been built yet Only estimates of instrument 
source characteristics are available to guide the choice of buffer sizes and estimate 
the impact of rate control. The instrument has 192 spectral bands, 12 bits/sample 
quantization, with each band acting like a separate imaging system. Data denved from 
similar instruments have shown that band-to-band entropies could vary by as much as 
6 bits/sample, depending on the prediction method used. This could mean a lot of f. 
bits for the lower entropy bands if a single fixed-line buffer length must a pp!y to a 
bands (in order to minimize the truncations of higher entropy bands). The HIRIS 
instrument data problem is looked at more closely in a later section. 

FIFO Buffer Priority. While adjustments to line-buffer sizes can affect the 
amount of truncation that occurs (at the expense of efficiency), we have so far ignored 
the possibility of altering the way truncation errors (e.g., Fig. 35) occur to a more 

acceptable form. 

Another way to look at the truncation error that occurs in Fig. 35 is to note the 
following: Since damage occurs on the right side of the picture, the picture elements 
(samples) on the left side have "more priority" for the limited bits allowed per line than 
do the samples on the right side. When the bits run out, the samples with the least 

priority are deleted. 

Alternate Lines. Now, alternate this priority from right-to-left on odd lines to 
left-to-right on even lines. By this we mean that alternate lines are coded and loaded 
into the FIFO buffer in opposite directions. The result is shown in Fig. 36. ines 
are complete on the left side, and EVEN lines are complete on the right side. But 
truncations occur on BOTH sides of the image. 
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Fig. 36 Coding Alternate Lines in 
Opposite Directions 


D 




The figure shows even and odd lines being coded in opposite directions. (The scale 
does not allow all lines to be visible.) As shown, the region of line truncation on the 
right side is the same as in Fig. 35, but every other line (the even) is now without error. 
Similarly, the region of line truncation on the left side is (approximately) a mirror image 
of the region on the right side, where now the odd lines are without error. A blowup of 
the upper left-hand corner of Fig. 36 is shown in Fig. 37. 
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Fig. 37. A Blowup of the Upper Left-Hand Corner of Fig. 36 

Since alternate lines are without error (on either side of the image), the missing 
pixels of truncated lines can be "filled in" by interpolation. The net result of this 
approach is that the missing data in Fig. 34 are traded for a loss of resolution on both 
edges of the image. This procedure was implemented for the Voyager 2 encounters o 

Uranus and Neptune. 

Note that this prioritizing does not prevent the use of two-dimensional prediction 
discussed in an earlier section because when the coding of one line begins, the 
reproducible length of the previous line is known. This defines those samples that can 
be used to aid in prediction since a decoder will also be able to reconstruct them. 

Prioritized LSBS. We will supplement the alternating left-right priorities 
described above by adding new conditions involving the least-significant bits of each 
sample in a line. 

The concept we present here is quite simple, though the notation could easily 
grow into unmanageable proportions. To minimize this possibility, we will use a more 
specific limited example and extend earlier notation. 
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To do this we modify and add to Fig. 19, as shown in Fig. 38. Here, as we have 

been assuming, X(j) is now a complete line of n-bit data instead of a single block. The 

Split-Sample parameter l is set to i = 2 so that L \ is a full line of the two least- 

~ n 2 ^ 

significant bits of each sample of X(j), and M x ’ is a full line of the n-2 most-significant 
bit samples of X(j). 

n 2 

For the coding of M x ’ , we use a line coder of the form PSI15,K+, as described 
in Eq. 110. 

Next, we perform another reversible operation on the 2-bit Isb samples of L 2 . 
First, note that even these two-bit samples exhibit significance. The operation SS^.f 
from (51 )— (54) strips off the least-significant bits of each two-bit sample and places 
them in the sequence A 0 (note that these are also all the Isbs of X(j)). The remaining 
bits of each L 2 (which are the bits of next significance in samples of X(j)) are placed in 

Ai. 


The sequences PSI15,K+[M X ’ 2 ], A^ and A 2 
coded line X(j) as 


are concatenated to 


produce the 


PSI?[XG)] = PSI15,K+[M X ’ 2 ] * A! * A 0 (1 73) 

By the arguments describing Fig. 19, we know that this coder will perform 
almost identically to one that codes all the samples of X(j) directly as 

PSI15,(K+2)+[X(j)] (174) 

provided that the entropies of the M x ’ 2 sequences are high enough. 
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PSI? 



Fig. 38. Arranging for Bit Priority 


As described earlier, the initial "split" of the two Isbs off thejnjxjt data shifts the 
effective Dynamic Range for the coder used to represent upward by 2 

bits/sample. That is, if the lower end of the Dynamic Range for the M x ’ coder is at 2.5 
bits/sample, the lower end of the Dynamic Range for PSI? in Fig. 38 and Eq. 173 will 
be at 4.5 bits/sample. Thus, if entropies lower than this shifted Dynamic Range are not 
expected for an application, there is no performance penalty. This can be expected to 
be the case for many new instruments which have upward of n =1 2 bit quantization. 
The low-end entropies can be 5 or 6 bits/sample. And if we take K = 0 (and X = 1) for 
PSI15,K+ in (173), the lower end of efficient performance is at an entropy of only 

H5 = 0.5 + 1 + 2 = 3.5 bits/sample 
well below the minimum expected data entropies just noted. 
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Now, consider what this "pre-splitting" does for rate control, assuming the same 
FIFO Line Buffer constraints as depicted in Figs. 32 and 33. With the arrangement in 
Fig. 38, the input to the FIFO Buffer of length l p = p.L in Fig. 33 is the coded sequence 
PSI?[X(j)] in Eq. 173. 


Figure 34 is expanded and modified in Fig. 39 to help see what happens as the 
result of this new PSI? and the FIFO buffer limitations. The abscissa has now been 
normalized to represent average bits/sample instead of total bits. Again we assume a 
buffer length equal to the average coded line length, which has been normalized in 
the figure to A’ = A/(# samples per line = L/n) bits/sample. The source data are 
presumed to be n-bit data so that the maximum normalized line length is n p max 
bits/sample. For discussion purposes, the distribution of normalized line lengths 
around the average is shown. 
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Fig. 39. Example: Normalized Line-Length 
Distribution for Bit Priority PSI? of Fig. 38 
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As in earlier examples, if a normalized coded line length (bits/sample) is shorter 
than the buffer length, A' here, there is no truncation. 

As the normalized line length is increased past A’, end bits of Aq will be 
deleted. That is, the least-significant bits of one end of such a line will be deleted, 
allowing only n-1 bit reconstruction (keep in mind that this error represents only one 
quantization level out of 4,096 for n = 12 bit data). All A 0 will be lost when normalized 
line lengths reach A’+1 bits/sample. 

Similarly, the second Isbs of some or all samples will be lost for normalized line 
lengths between A' + 1 and A’ + 2 as sequence A-j is effected. With two Isb’s gone, 
only n-2 bit precision reconstruction can be accomplished. 

n 2 

As normalized line lengths exceed A' + 2, the coded sequence PSI15,K+[M X ] 
in (173) will be affected, which causes truncation of samples at the ends of lines, as 
before. Those samples not truncated can be reconstructed with n-2 bit precision. By 
alternating the direction of coding, truncated samples can be filled in by interpolation, 
as before. 

What we have done in these examples is to identify several different parts of a 
line's representation and order those parts by their relative importance. This ordering 
turns into a relative "priority" for the coded bits when they are loaded (in their order of 
importance) into a FIFO buffer. The priorities in this last example were: 

(1) Samples have decreasing importance from one end of a line to the other. 

(2) The priority direction in (1) above alternates from line to line. 

(3) The next least-significant bits of all samples (A-|) are less important than 
all the n-2 most-significant bits. 

(4) The least-significant bits of all samples in a line (Aq) are less important 
than all other bits. 
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The advantages of this "more graceful" rate control are obtained at the cost of 
the slightly greater implementation complexity required to reorder the coded data bits 
before loading the FIFO. 

Priority Variations. Certainly there are many variations to this prioritization 
scheme. Among them: 


* Aq, Ai , etc., could apply only to the edges, which leaves the central 
portion error-free. 

* The error that results from truncation can be reduced without a 
performance penalty by rounding the most-significant bit samples that 
are coded by using PSI15K,+. (For a single Isb truncation, add a 
pseudorandom binary sequence during reconstruction. This will assure 
that the average error is zero.) 

* Resolution can be more directly involved in the planned priorities. A line 
(or portions of a line) could be rearranged before coding into sequences 
of the even samples followed by the odd samples and vice versa, as 
shown in Fig. 40. 

Figure 40 assumes that every line is arranged into odd then even samples. 
Once coded, the even samples will be deleted when the FIFO overflows. Assuming 
alternating left-right/right-left coding, it is easy to see that 


Provided that no more than 1/4 of a line's samples are truncated 
(i.e., half the even), then 


(175) 


All deleted samples (except on the image edge) will be completely 
surrounded on all sides by samples which were not deleted, 
allowing for an accurate interpolation reconstruction. 
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0; = ith odd sample 
e ; s ith even sample 



Fig. 40. Ordering Samples by Odd/Even 

By alternating the ordering of odd and even samples as in Fig. 41, condition 
(175) above can be replaced by the condition 


• Provided that no more than 1/2 of a line’s samples are truncated 

(i.e., all the even or all the odd) l 177 ) 

Effects on Prediction. A slight penalty must be paid in prediction 
performance. Assuming the simpler case in (175) and (176), prediction of even 
samples is unaffected. But the prediction of odd samples cannot use an adjacent 
(possibly missing) even sample. It must use the previous odd sample and/or the 
corresponding odd sample in the line above. 
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NEW CODING ORDER BEFORE PSI? 



Fig. 41. Alternating the Ordering of Samples 
by Odd/Even 


A Multispectral Frame Buffer Definition 

A multispectral scanner will ultimately produce many two-dimensional images, 
one for each spectral band. A multispectral IMAGE is a collection of these 2-D images. 

The HIRIS instrument currently being developed simultaneously produces one 
line at a time for all of its 192 spectral bands. The collection of one line from each band 
is called a FRAME. The collection of many FRAMES constitutes a multispectral IMAGE. 
See Fig. 23 for further clarification. 


Sample Data. HIRIS could generate an extraordinary range of data entropies 
when comparisons are made over all of its spectral bands. Ultimately the actual 
variability will depend on the level of uncorrected sensor noise and/or the method of 
prediction used. Here we will focus on a worst-case entropy variability situation. 

Figure 42 was derived from a sample multispectral image taken with an earlier 
224-band instrument called AVIRIS [15] The fiat-field corrected image used a 
quantization of 10 bits/sample in each band. The figure shows a plot of observed 
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entropies for each band obtained from the use of the 2-D predictor described in (112). 
The entropy of the average prediction error distribution (over all bands) was 

Hg = 4.9 bits/sample C 78 ) 

Performance using an appropriate PSI14,K + followed this graph closely, which 
yielded an average performance for the complete image of 

A’ = 4.6 bits/sample ( 1 79 ) 


HCP) - Entropy of Average 2-D 
Prediction Error Distribution * 4.9 bits/sanple 



Fig. 42. HIRIS Entropy Variability-AVIRIS Image 


Both Hg and A' in (178) and (179) are about 0.3 bit/sample lower than would be 
achieved by a 1 -D Cross-track predictor. 

The Buffering Problem. The graph in Fig. 42 represents an average over the 
many FRAMES that make up the sample image. FRAME to FRAME fluctuations of this 
graph would reflect the relatively smaller line-to-line activity variations within one 

spectral band. 
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Here we are interested in constraining rate over a single FRAME. For 
discussion purposes we will assume that the graphs for A’ and Hg in Fig. 42 are 
representative of a particular FRAME. 

The general impact caused by requiring individual FIFO line buffer rate control 
for each spectral band in a FRAME - all fixed to a single line length — should be 
evident from Fig. 42. Clearly, with a normalized line length equal to the average 
performance A', there would not only be a lot of truncation (higher activity bands) but 
also a lot of filler bits (lower activity bands). Of course, the line buffer length might be 
larger than A’, at least reducing the truncation problem. But we won't count on that 
here. 


The priority approaches just discussed are applicable here. In fact, individual 
bands could be assigned different buffer lengths to customize the effect of truncations 
caused by the FIFO buffers being too small. But truncations and other errors would still 
occur that could be avoided with a larger buffer. 

As in earlier discussions, a FRAME Buffer of length 

A' x (# samples per FRAME) bits (180) 

could hold a complete coded FRAME without error — clearly a significant improvement 
over many line buffers of the same total length. 

Remember, any buffer we talk about is really a convenient way of expressing a 
bit count constraint placed on a block of data or an interval of time. The buffer itself 
may not need to be physically present. For example, the concept of a FRAME Buffer 
may really mean that a data link allows the transfer of a certain number of bits over the 
time interval needed to generate a FRAME ("Frame Time"). As long as all data can be 
retrieved directly from the instrument, coded and transferred immediately with the 
allowed number of bits, no FRAME Buffer would be necessary. 

A convenient way of achieving this is to constrain the number of bits to be fixed 
for each spectral line - with the consequences noted above. We will look for another 
way. 
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Not6 that the key advantage of a FRAME Buffer is that variable-length lines are 
allowed. The sum of those lengths must fit in the buffer. 

If a FIFO FRAME Buffer were really available, a number of FRAME oriented 
priorities would be applicable. For example, reordering the spectral bands by priority 
would result in deletion of the least-important bands if the buffer weren't big enough. 
But this all-or-nothing approach might not be appropriate either. Instead let's consider 
a way to distribute any truncation and/or error losses more evenly over a FRAME. 

Now we again need some additional notation. Let X(j) denote a line from 
spectral band j, with uncoded length (X(j)) = L, where j = 1 , 2, . . . T. Here we have 
used the same notation as in Fig. 33, except that now each line is derived from a 
different spectral band. 21 

Let 

Lj = Coded Length of (a complete) X(j) (181) 

Then the average coded line length is 

i i T 

A = Ap=f j, L j bits/line (182) 

j=1 

where A' is the same term as in Fig. 42, and TA is the minimum-sized FRAME Buffer 
that could hold all of the coded X(j). 

Let 

9 : = JtL = fraction of total bits generated by X(j) (J 83 ) 

J TA 

B = Actual Frame Buffer Length, in bits (184) 


21 Note that T might be variable since some operational modes may delete some of 
the spectral bands from consideration. 
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and let 


Y= 


B_ 

TA 


Fractional Comparison 
of Actual Frame Buffer 
size with the minimum size 
that can hold all coded 
spectral bands 


(185) 


Now we can allocate the total number of bits available, B, to T FIFO line buffers 
according to the priority given in the {6j}. The buffer sizes are 


Bj = 0jB bits (186) 

Data for each coded individual spectral band will now be controlled by their individual 
FIFO buffers, instead of a FRAME Buffer. 

And clearly, since E0j = 1 , 


T 

£Bj = B bits (187) 

j=1 


By substituting from (181), we have 


Bj = 6jB = ^ (TAy) = y Lj bits 


(188) 


So the LINE Buffer size Bj is a fraction of the size needed to hold all of a coded X(i) 
When y = 1 


Bj = Lj and B = TA bits ( 1 89) 

By this process, we have established a set of variable-length FIFO line buffers whose 
individual lengths reflect their relative need for bits, based on the length of coded data 
(an "entropy" criterion). 
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When y> 1, there is no loss. If y is decreased below unity, each spectral band 
will begin to experience some truncation (and/or error) since each FIFO line buffer is 
no longer big enough to hold a complete coded line. 

Thus, we have distributed the effect of FIFO buffer truncation over all the bands 
rather than only the very active. Of course, the priority scheme used for each spectral 
band could still be customized. 

Note that if the variable buffer sizes could be set up a priori, the existence of 
real physical LINE Buffers would not be needed. Each spectral line, X(j), would be 
coded on-the-fly with 0jB bits. This is not unreasonable, since an entropy graph such 
as Fig. 42 is representative, even if it does vary. 

But since the Lj for each spectral band can be expected to be highly correlated 
from FRAME to FRAME, the assignment of FIFO LINE Buffer lengths Bj could be made 
adaptive by using an Lj calculated from the previous FRAME for each spectral band. 

Adding Other Priorities. The {6j} are priorities based on a criterion of 
entropy. Other criteria can also be integrated into the determination of FIFO LINE 
Buffer lengths, as we shall illustrate. 

Let 


Pj, j = 1,2 T 


(190) 


be a set of spectral band priorities based on some other criteria - perhaps "scientific 
importance to the current investigation. Then we must have 


0 < pj < 1 and ^pj = 1 

j=1 


(191) 


To weight the two criteria, we choose 0 < a < 1 for the {0j} set and (1 a) for the (Pj) se L 
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We can now modify the FIFO LINE Buffer assignments in (186) to 


Bj — [a0j + (1 — a) (3j] Bj bits 


(192) 


for spectral band j. 

The addition of such priorities can cause the assignment of bits in (192) to 
exceed the amount needed to achieve noiseless coding (Lj in Eq. 181). But this can 
be accounted for by iterating the assignment of bits to redistribute the extra bits to 
other lines. 
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APPENDIX A 


MORE ON THE MAPPING 
OF PREDICTION ERRORS 


The function that maps prediction errors while placing priority on negative 
values over positive is transferred from Eq. 1 8 for convenience. 


( 2Aj OiSAj^G 

<J£- + {x\, Xj) = 8j = \ 2|Aj| - 1 -9 ^ Aj < 0 

I 6 + |Aj| Otherwise 


(A-1) 


and 0 is defined in Eq. 19. 

When the positions of positive and negative differences in (16) are reversed, 
this mapping becomes 


(Xj, Xj) = 


( 2Aj — 1 0 < Aj ^ 0 

) — 2Aj -0 ^ Aj ^ 0 

I 0 + | Aj| Otherwise 


(A-2) 


A 

If X and & are data and predictipn sequences, then t y^ _+ (X,X) means the 
application of (A-1 ) to each sample of X, X; and similarly for ^ +- (v)- 


in most situations, the two mappings are statistically equivalent. However, 
certain kinds of images exhibiting many long brightness ramps might benefit from one 
or the other, or both (an adaptive preprocessor). 


Relationship between and jMr* 

Define the Ones Complement of a sequence of one or more n-bit data samples 

y = yi Y2 y3 ■ • • b y 

ONE (y ) = 2 n - 1 - yi , 2 n - 1 - Y2. 2 n “ 1 “ Y3> • ■ • ( A " 3 ) 


Then, with Xj and £j as individual n-bit samples and their predictions 
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(A-4) 


Xj) =^+-(ONE(xj), ONE(xj)) 


and 


^+-(xi, xj) = ^-+( ONE(xj), ONE(xj)) (A-5) 

^ A 

But more important, by extending this to data and prediction sequences X and X, we 
have 


A A 

<Jt- + (k,X) =^+-(ONE(X), ONE(X)) 


(A-6) 


and 


^+-(X,X) =e^-+(ONE(X), ONE(X)) 


(A-7) 


Test Vectors 

Equations (A-6) and (A-7) tell us that a data sequence (and prediction) that 
produces a known sequence of 8's by one mapping can easily be converted to an 
alternate data sequence (and prediction) that will generate the same sequence 
of 5's by using the alternate mapping. It is, of course, the 5’s to which adaptive 
variable length coding is applied (Fig. 4). The two complementary data sequences 
might be test vectors for this coder. 

Decoding 

A 

IV 

X and X can be retrieved by applying the inverse mapping operations of Eqs. 
(A-1), (A-2), (A-4)-(A-7) and any a priori information, and Eq. (115) so that 

^ A ^ A 

(X,X) = inv t y^'-+ {*y^~+(X,X)} (A-8) 

and 

A A 

(X,X) = in vJ(+- {^+-(X,X)} (A-9) 
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M I 1 1« 


(A- 10) 


But suppose we apply the +- inverse map to the result of applying the -+ map: 

(X\X') = inv^+- {^-+(X,X)} 

We don’t get the original sequence back, but we know that by (A-7) 

=^- + (ONE(X’), ONE**’)) (A-1 1) 

so we must have 

(X,&) = (ONE(X'), UNE(^)) (A-1 2) 

Ultimately, all this means that a "decoder" using the ) map could be 

used to decode data that had been generated using the ^ + ~( ) map, and vice 

versa. 22 


| : . - - ■ ' - - ~ - * ' ■ I 

i - ^ i 

i I 


22(siote that any Reference Sample indicated in Chapter IV must also be 
complemented before decoding begins. 
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APPENDIX B 


MORE ON THE PSI1,k 
SPLIT-SAMPLE OPTIONS 

Following the same notation as in Chapter III, Split-Sample Code "Option" 
PSIl.k is defined in (58) and Fig. 12 by 

v 1,k[S n ] = V-|[M n . k ]*Lk (B-1) 

where 6 n is^a J-sample block of n-bit samples, M n . k are all the most-significant n-k bit 
samples of 5 n , and L|< are all the least-significant k-bit samples of 5 n . 

PSI1 ,k constitutes one of several coding algorithms that are used with adaptive 
coder PSI14 or PSI14.K and others. Since decisions that determine which code option 
to be used are made over a complete J-sample data block, it is natural to arrange the 
individual pieces of the coded blocks PSI 1 [Mn,k] and as in (A-1). There are subtle 
advantages of certain approaches to implementations and a performance advantage 
under certain rate-controlled situations, as discussed in Chapter IV. Consequently we 

have maintained the definition in (A-1) throughout. However, here we'll take a closer 
look at this definition. 

PUTTING THE PIECES BACK TOGETHER 

We need to expand our notation slightly. Let 

5 n = 5 -| 82 . . . 8j 

be the sequence of n-bit samples to be coded, and let 

M n > k = m-j m 2 . . . mj 

denote the sequence of n-k bit samples of 8 n , and 


(B-2) 


(B-3) 


Lk = Isbi * Isb 2 * ... * Isbj 


(B-4) 


1 1 1 


denote the corresponding sequence of all the k-bit least-significant bit samples. 

Then, by (27), (29) and (A-1), we have the expansion 

Vi k [5 n ] = fs[m 1 ] * fs[m 2 ] * • . . fs[mj] * Isb-i * lsb 2 * . • ■ >sbj (B-5) 

we could instead rearrange (A-5) to 

V 1 k [5 n ] = fs[mi] * Isbi * fs[m 2 ] * lsb 2 * ... * fs[mj] * Isbj 


or 


n k [5 n ] = Isbi * fs l m ll * ,sb 2 * fs l m 2] * ■ ■ • lsb J * ( B * 6 ) 

With this arrangement it is more obvious that the coding of any individual sample for 
§n say 5j, is the application of the variable-length code, say C n k[]. where 


C n ,k[Sj] = Isbj * fs[mj] 

We have merely collected the individual coded pieces of each sample in (A-5). 

Optimality of the Code, C n? k[ ] 

Yeh [11] has shown that the set of simple codes in (B-7) is equivalent to a class 
of Huffman codes under the Humblet condition* This is a powerful result because it 
explains more directly why the measured performance of each PSIl.k code option is 
so good - that over a limited entropy range, there is none better. 

Multiplexing 

Now, subscript or superscript the n, 5 n , 5j, mj, Isbj, and k in the descriptions 
above by a, b and c to denote that each block (and its coding) is derived from different 

data sources. 


23 Assuming the "optimized fs code." 
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6 ) 


Clearly, we could mix the coded blocks from the three data sources as, from (B- 


V 1 > k a [C] * Vi ,k b [C] ’Vl.kc [«" C ] (B-8) 

But we could also go further than this and multiplex the individual codeword pieces 
from (B-7) for any n a , n b , n c , k a , k b , kc as 

('sbf * (isb? • (lsb° * fs[mf])* ... * 

( lsb J * ,b [ m j])' ( lsb J * ,s [ m j])* ('s b 3 ‘ ,s [ m j]) (B-9) 

without incurring any decoding difficulty, provided that the n a , n b , n c , and k a , k b , kc, 
quantities are known by a decoder. But presumably these quantities will only change 
because of the internal decisions of the three separate adaptive coders employing 
PSIl.k code options. 24 In that case, three corresponding code identifiers would 
precede (B-9) to reveal any changes. 


24 For example, three separate PSI14 coders, all with starting option PSI1 0 (i e X - 
1 ). 
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APPENDIX C 


COMPARING PSI0.1 WITH PSI1,0 

By definition, 

<2(V0,l[5 n ]) = £ TOM 0 ’ 1 ]) + J bits (C-1) 

where M 0 * 1 is the sequence of the most-significant n-1 bit samples of 6 n . We wish to 
compare this with 

F 0 = J?(Vi [8 n ]) = ^(^[Mn.O]) bits (C-2) 

By Eq. 80, 

F-| = J^V^M". 1 ]) bits (C-3) 

is the length of the Fundamental Sequence for M^ 1 . By Ref. 5 

Fi > |(F 0 + J - 1 (Isbs of 5 n )) (C-4) 

which gives us Eq. 81 for k = 1 , if the split least-significant bits are random. 

But from (41) 

£(V 0 [MnJ])< [xJ + 2 (F 1 -J) (C ' 5) 

Then, by using (C-1), we get 

je ( ¥ 0 ,l[8 n )>S [yJ+2F, -J 

or 


By substituting (C-4) 


114 


(V 0l i [8 n ]) < 7(F 0 + J - E Isbs) - 6J = 7 Fq + J - 7L Isbs 
if we ignore the integer requirement, then 

«£(Vo,l[5 n ])<YO,1 = ^ [7F 0 + J - 7£ Isbs] (C-6) 

The minimum and maximum values of any y 0i i occur when all the split Isbs are 
ones or zeroes, respectively, giving us 


min Y0,1 =6 F 0~ J (C-7) 

7 c J 

max Y0,1 =6 F 0+e (C-8) 

When the Isbs are distributed randomly, yo,i would lie between these limits, as 


E{Y0,1 I random Isbs} =|f 0 -^J 


(C-9) 


These results are plotted in Fig. C-1. 

The figure plots yo, 1 from (C-7)-(C-9) as a function of F 0 . It also shows F 0 
plotted against itself since this is what we wish to compare. 

Remember, we are really interested in the average properties of yo 1 in (C-6), 
that is, E{yo,i}. The yo,1 plots in Fig. C-1 are contingent on specific distributions of the 
split least-significant bits — all ones, all zeroes, or random. 

Not until entropies start exceeding about 3 bits/sample (corresponding to Fq > 
3J) can the distribution of Isbs be considered random. Then E{yq -)} and the 
performance of PSI0.1 will follow the center curve for yo.i (Eq. C-9). But by then, this 
curve lies above Fq, having crossed at Fq = 5J/2. 
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Fig. C-1 . yo,i Performance Bounds 


Although there are sequences that can be coded better with PSI0.1 than PSI1 , 
on the average, PSI1 will do better. Furthermore, as F 0 > 5J/2, PSI14 will be choosing 
the next Split-Sample mode PSI1.1 which will outperform PSI1 itself. 

As entropies are lowered, the Isbs will tend toward more and more zeroes until 
at jq s = 0 all the Isbs MUST BE ZERO. There is only one possibility for 70,1 as shown 
by point (A) in Fig. C-1 . Here y 0i i corresponds to max Y0 ,1 in Eq. C-8. Clearly, the real 
E{yo 1 } will gradually move away from point (A) and merge with the graph for random 
Isbs.’ Until this happens, the average performance of PSI1 should be better than 

PSI0.1. 


But we have already considered the higher entropies. Thus, it would appear 
that for all practical purposes, the average performance of PSI1 should always be 
better than that of PSI0.1. 
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APPENDIX D 


GAIN/OFFSET SENSOR NOISE 

GAUSSIAN ENTROPY FUNCTION 

The density function for a Gaussian random variable £ is given by the familiar 

,<C)= ^ e ' (C ~ m)2/2 ° 2 


where 


m = mean value 


(D-2) 


and 


a = Standard Deviation (D-3) 

It can be shown that a good approximation to the entropy of a quantized for a 
> 1 , is given by the Gaussian entropy function [16] 

1 o 1 In (a 2 + tU) 

h G(°) = 2 lo 92 [27te(o 2 + ^)] = 2.047 + — YJn2^~ (D-4) 

The inverse of D-4 is given as 

a = ( e [( H G(^)-2047)/1 .386] _ _!_ (D _ g) 

PREDICTION ENTROPY 

Let P = {pj} denote an observed distribution of prediction error differences and 
let 


a(P) 


(D-6) 
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be its calculated standard deviation in data numbers (DN). A direct calculation of the 
entropy of P is then from Eq. (10) 

H(P) = - £ pj log2 Pj bits/sample 

j 

From numerous simulations involving various data 
observed that for a broad range of differential entropies, 

H(P) = Hq (o(P)) ( D ' 8 ) 

Discrepancies of less than 0.1 bit/sample in the two sides of (D-7) were typical. Thus, 
the Gaussian model can be assumed to be reasonably accurate in estimating 
differential entropies for real quantized data sources. 


(D-7) 

sources M 2], it was 


Introducing Noise 

The sample-to-sample sensor noise effects caused by variations in gain 
sensitivity and offset for some modern instruments can also be well modeled as 
Gaussian, and independent of the real signal.^ We can use these results to obtain a 
reasonable measure of the expected effect of this gain/offset noise on code rate. It is of 
particular interest to investigate this impact because this form of noise is potentially 
correctable within the instrument data system l 17 !. 

Figure D-1 shows the Preprocessor portion of module PSH4.K+ with an input 
that includes this sensor noise. 

Here, Sj is the real ith signal value and Xj is the corresponding value seen by 
the Preprocessor AFTER a Gaussian noise signal, nj, is added to it. The ith difference 

signal, Aj, becomes 


25For our purposes here, the real signal may actually already include other non 
correctable noise effects, such as shot noise. 
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additive (Standard) Preprocessor 
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Fig. D-1. Signal Plus Noise 


A’j =x j -Xj _ 1 

= s, + n, - (Sj — -j + nj_i) 

= ( s i “ s i-1 ) + (nj - rij_i) 

= A i + n 'i (D-9) 

where Aj is the difference signal if noise were not present, and n’j is a new zero mean 
Gaussian random variable with Standard Deviation a' n . This suggests the statistically 
equivalent diagram in Fig. D-2 where we now make use of (D-7) to assume that Aj is a 
zero mean Gaussian random variable with standard deviation g s . 
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Fig. D-2. Difference Signal Plus Noise 


Here the error signal that eventually gets coded is more clearly the sum of the two 
independent zero mean Gaussian signals &\ and n'j. Then summarizing, we have 


= Standard Deviation for Gaussian Noise Signal 


g s = Standard Deviation for (Gaussian) 
Signal Predictor Error 

a s+n = Standard Deviation Gaussian Signal 
= o s + c' n 


(D-10) 


We can now investigate coding efficiency with and without noise by evaluating 
the impact on Gaussian entropy HQ(o s+n ) = Hq(os + °’n) for various levels of noise, 
as specified by cr' n or H<3(c’ n ). 

We will not exhaustively study these effects here since we are not dealing with 
any specific instrument. It will suffice to get a feel for the effects. Plots of HQ(a s + ° n) 
vs Hg(o s ) for various levels of HG(o' n ) are shown in Fig. D-3. Some similar curves first 

appeared in Ref. 1 . 
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Fig. D-3. Signal Plus Noise Entropies 


DISCUSSION 

The curve to use for comparison in Fig. D-3 is the 45° line that represents signal 
entropy alone (i.e., Hg(cj s ) vs FIg(cf s )). 

As noise is increased above zero, the overall entropy, Hq(g s + a'p), increases 
at all signal entropy values. But the net increase caused by the noise diminishes 
dramatically as the underlying signal entropy increases. For example, when the signal 
entropy is Hq(o s ) = 0, the impact of a noise signal with 4 bits/sample of entropy 
( H G(°n) = 4) is clearly to increase Hq(g s + o»' n ) from zero to 4. But if the actual signal 
entropy is Hq(o s ) = 6 bits/sample, the impact of a HG(cr'n) = 4 bits/sample noise signal 
is to raise the overall entropy HQ(a s + a' n ) by a comparatively insignificant 0.3 
bit/sample. 
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AVIRIS Image 


Consider again the plot of prediction entropy for a flat-field corrected 224- 
spectral-band AVIRIS image in Fig. 42. This is essentially a plot of "signal entropy- 
over all the individual bands. The entropy of a measured average prediction error 
distribution was 4.9 bits/sample. At such a high value, the impact of gain/offset noise 
with entropies of as much as 4 bits/sample would, by Fig. D-3, be generally less than 
about 0.5 bit/sample. But in reality, the impact would be higher. 

Many of the spectral bands exhibit signal entropies much lower than 4.9 
bits/sample. For example, at a signal entropy of H G (o s ) = 3 bits/sample, a noise with 
HG(o’n) = 4 bits/sample would increase the overall entropy, Hq(o s + o' n ), by about 1 .6 
bits/sample. Clearly, this represents a significant reduction in coding efficiency for the 
spectral band that has a 3 bits/sample signal entropy. 

Equalizing entropies. Note that the addition of significant levels of noise to 
all AVIRIS bands will tend to equalize the individual entropies. Suppose that without 
noise, entropies varied from 1 to 6 bits/sample, a spread of 5 bits/sample. With the 
addition of noise with, say, H G (a’ n ) = 5 bits/sample, the overall entropies would vary 
over the range 5.1 < H G (o s + a' n ) < 6.6 bits/sample, a spread of only 1 .5 bits/sample. 
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